Developing an Alexa Skill with Python

Nov 9, 2021

Read the following article for a detailed description on integrating our Weather API into Amazon Alexa, including a step-by-step video tutorial, in which I illustrate the process of creating an Alexa Skill using the Developer Console from scratch.

Here at Meteomatics, we pride ourselves on delivering very accurate weather forecasts through a single RESTful API endpoint, which is easy to use and provides a single access mechanism to one of the worlds richest weather and climate databases (>7 petabytes). Our technologically inexperienced customers can construct URLs with the help of our step-by-step guide, whilst more experienced users will find API connectors written for all the most widely used programming languages and software. However, we’re always looking for ways to make interacting with our data even easier, and in 2021 that can mean only one thing.

"

Yes, Meteomatics is developing voice assistant compatibility!

Now, the key word here is ‘developing’. Even if a user only ever had one intention when speaking to a voice assistant (getting a forecast for the weekend, for example), correctly interpreting all the various ways this could be phrased (“tell me the weather for Saturday and Sunday”, “what’s it going to be like this weekend?”, …) requires a lot of trial and error. Of course, this single capability alone would be a poor use of the wealth of weather data available from Meteomatics, so we also need to interpret other phrases, corresponding to different requests, without confusing them with the other statements.

Luckily, in the last few years Amazon has created the Alexa Skills Kit (ASK) to facilitate the creation of custom tools which can then be made available to any Alexa enabled device1, including smartphones with the Alexa App. In this article, I’ll talk about my experience getting started with the Alexa Skills Kit using Python, demonstrate our prototype Meteomatics skill, and share some of our plans for future development.

How to create an Alexa skill

Definitions

First off, whilst I want to avoid jargon as much as possible, there are a few terms which Amazon use quite specifically in the context of the Alexa frontend2, and it’s worth highlighting a few of these before we get carried away. An Alexa Voice Skill (AVS) – or, more commonly, simply ‘skill’ – is the name Amazon has chosen to give the tools developed for Alexa enabled devices. Several valid ‘intents’ might fall under the broader umbrella of the skill. For instance, users of our Meteomatics skill might sometimes want to know how wet it’s going to be over the next few days, or perhaps find out about polar icecap retreat in the last 20 years. Answers to both these questions can be found in Meteomatics data, so can both be handled by the same skill3, but reflect quite different intents of the user (albeit ones which are hopefully quite easily distinguished).

Additionally, each intent might be subject to slight variations, like getting the forecast for the next few days as opposed to for the whole of next week. Instead of having to define a new intent for different lengths of forecasts and different start times, the same intent can be used with slight variations in the form of ‘slots’. Finally, as I mentioned in the introduction, users shouldn’t be restricted to identical syntax every time they want to ask Alexa for the same thing – after all that’s not how we humans talk. The various different ways a user might specify the same intent are called ‘utterances’, and preempting all the various utterances which should map to the same intent without overlap with other intents is part of the challenge when developing an Alexa skill.

The Skill Developer Console

Ok, that’s enough to get us started. So, how do we go about writing an Alexa skill? Amazon’s own walkthrough suggests that you download the source code for handling processed speech (the Skill Developer Kit or SDK) to your PC, before configuring cloud hosting for speech-to-text processing. If like me, however, you find all of this rather intimidating, then don’t panic! It’s now possible to develop your skill from start to finish entirely in the Developer Console – which can be accessed through a web browser – and doing so means your skill will be initialized with a connection to AWS Lambda right out of the gate.

The frontend

Once you’ve created a skill, you’ll see an interface with several tabs. In my example below, you can see that my skill’s name is Meteomatics, and that we are currently on the Build tab. This tab handles the frontend of the skill, and in the sidebar you can see the two intents I’ve been working on: MyWeatherNowIntent and WeatherWhenWhereIntent. Additionally, since I’ve highlighted the WeatherWhereWhenIntent, in the main window we can see the utterances4, which also show the slots which this intent is expecting. Elsewhere in the console you can set the data types for these slots, as well as tell Alexa whether these are all necessary in order to fulfil the intent (and if so how to prompt the user to provide them if they were forgotten).

alexa-developer-console

The purpose of the frontend, and the reason a connection to a cloud server is required at all in the skill, is to translate the speech spoken by the user into input which can be utilized by the code in the backend. This is all done inside a black box – which is just as well because natural language processing is complicated – so we can just move on to the Code tab, where the backend is kept, and look at how the response is handled.

The Backend

In case it’s not immediately obvious, I’m using Python to develop my skill – not only is it the language that I’m most familiar with, I also discovered that there are relatively few accessible resources in Python available online (most AVS developers seem to prefer JS) so I hope it will be a useful example.

There are a few things worth pointing out in the code snippet above. First, you can see that both the visible IntentHandler classes share some properties – in fact this is true of all the other Handlers too. These properties are: they inherit from the AbstractRequestHandler class; they contain a can_handle() method with identical syntax (which simply returns True/False according to whether the Handler is appropriate for the intent); they contain a handle() method within which all the functionality associated with the intent is written; and they both return a handler_input object which has speak() and ask() methods and a response attribute. The speak() method takes the text which you want Alexa to say as a result of the handler logic, whilst the ask() method takes any text with which you’d like Alexa to reprompt the user with after a period of waiting (eight seconds by default).

The handler_input argument is an instance of the HandlerInput object you can see imported at the top of the script. This object is an absolute nightmare to comprehend, since its attributes are typically themselves object instances with object instance attributes of their own (see line 50 for an example of a modest number of such nested objects – many desirable attributes are nested even deeper). I wouldn’t recommend spending too long trying to get your head around this structure in your first skill: instead search the web for anything you want to achieve and copy the code5.

Although you can see a helpers.py script in my Code tab, your first skill won’t have this by default – I simply created it to keep the functions which contact the Meteomatics API separate. What is important, though, is the requirements file. This will be initialized with access to the ASK packages, but if you want to add packages for your own purposes you will need to add to this. The easiest way of doing so is by developing the backend functionality of your skill offline in an IDE, making a new environment in which to work and installing the required packages there. Once the code is behaving, it can be moved to the Developer Console, and the requirements file can be made from the offline environment using the command

pip freeze > requirements.txt

Copy and paste these requirements at the end of the file on the Developer Console (make sure you don’t overwrite the packages which are there by default).

At the end of the lambda_function.py script you’ll see that the default Handlers are already being added to a SkillBuilder object. This is essentially how ASK compiles the various Handlers contained in the script. Make sure you add any new Handlers to the SkillBuilder, echoing this syntax, and be aware that the order in which you add them can sometimes be important!

Testing

Now we can move the Test tab. In order to test your skill, you first need to select ‘Development’ in the dropdown box near the top of the screen. Then you can go ahead and start testing Alexa’s response to various utterances. You can get Alexa to process your speech to text or, if you’re in a busy office and feel a bit embarrassed talking to your computer all day, you can equally type sentences into the simulator.

On the left you can see an example dialogue I had with Alexa. Because some Alexa devices are always listening, it would be silly if they could respond to everything they overheard which sounded like a skill intent. Hence your skill must first be invoked with a launch phrase. I like being polite to my robots, so I ask Alexa to ‘please launch meteo matics’6 – you don’t have to do this: just using the skill invocation name is sufficient. Now that Alexa knows to listen out for intents associated with the Meteomatics skill, I can use an utterance which maps to my MyWeatherNowIntent. Alexa’s response is somewhat wordy: because of how I coded the backend, she first tries to get my precise live geolocation; after failing to do so7 she tries to get the address registered to the device; this is impossible because I’m on my laptop (but it does work on my phone) so she defaults back to a pre-programmed address8 and tells me what it is; before contacting the Meteomatics API with this location and the current date/time to get some relevant weather data, which I process into intelligible speech.

This first intent doesn’t involve any slots. The location, date and time are all obtained either from the device or within the Python script. I wanted to see how slots worked, so designed a second intent which requires extra information from the user. I’m thinking of making my way back to my hometown for the holidays, and want to see what it’s going to be like during our traditional family Christmas morning walk, so I decided to ask Alexa. Initially this is disappointing – she says she can’t find the answer I’m looking for – but the failure is actually due to the previous session having ended and Alexa attempting to use the built-in weather skill instead of the Meteomatics one. (This could be avoided if I reprogrammed the SessionEndHandler in the backend).

So, I rephrase my question, this time combining the launch phrase and query. Now Alexa is responding as expected, but we still don’t quite have our forecast: I forgot to say when I wanted the information for. Because I made this a required slot for this intent, Alexa prompts me for the time, which I provide, and Alexa gives me the prognosis. . Looks like it will be cold but not too crazy9 – I’ll book my travel!

Debugging

That all went smoothly, but I’d caution you not to expect your first skill to work right out of the box. Debugging Alexa can be a bit of a nightmare, since the JSON request and response for each utterance doesn’t actually contain that much information on what might be broken in your backend. As such, I should quickly demonstrate the most useful debugging tool that I found in working in the Developer Console.

Back on the Code tab, at the top of the window you can see a menu option called CloudWatch Logs10. This is where Alexa stores information on the backend operations of your skill. A new log is automatically created for every session, and contains timestamped reports on each request sent by Alexa, as well as Python console error output if your code crashes. I also imported the logging module and logged several important checks at the .info level. Here’s a snapshot of the log from our previous conversation with Alexa:

Watch the video tutorial in which I illustrate the process of creating a skill using the Developer Console from scratch.

If you have any questions about following my workflow, contact [email protected] and I’ll get back to you.

Further Developing the Meteomatics Skill

What I’ve hopefully shown so far in this article is that, once you get past the jargon and understand how to build and debug an Alexa skill from start to finish, it is fairly straightforward to develop skills which handle multiple user intents and interact with external APIs. In the Meteomatics skill I access both the Meteomatics API and the Google geocoding API, neither of which involved anything more complicated than would be required if accessing these APIs from a typical Python script. The most challenging parts of developing a skill are a) designing a Voice User Interface (VUI) which captures the range of different ways humans might ask for the same things unambiguously, whilst avoiding overlap with other intents and b) deciding how to parse the often quite complex API response11 into text which Alexa can read and the user can understand. I’d encourage Meteomatics customers who’ve found this article instructive to experiment with incorporating our data into your skills: we’d love to see your ideas!

Our Meteomatics skill is not yet ready for public consumption. Since it’s still in development, it’s worth wrapping up by talking about what plans we have for the skill in the future, including what ASK can currently facilitate. Similarly, if you’re a user of Meteomatics data, or simply interested in our product, we’d love to hear your ideas for useful capabilities!

A significant benefit of Meteomatics’ weather data is that it is available at incredibly high resolution. This inspired the MyWeatherNow intent, which will ultimately use device geolocation to automatically get the current situation for your exact location. Of course, this isn’t a useful capability – more than likely you already know what the weather is like where you are right this second. The principle could be extended, however, to get the forecast for your current location. Forecasts are somewhat difficult to provide with a VUI, since there is a lot of information to synthesize into a single sentence. Take, for example, a radio weather forecast, which summarizes the situation for a geographical area in a few sentences. Radio forecasts are prepared by humans, who are good at getting an overall feeling from complex data and translating this into words; formalizing the logic required for a computer to do the same task is much more complicated.

Perhaps a more practical use of geolocation data is in setting a weather alert or reminder. Alexa skills have the capacity to remember session attributes and recall them later or use them to determine when next to perform some action. If you’re hanging washing out to dry, or trying to plan when to leave a friend’s house to avoid getting wet, it might be useful to tell Alexa in advance to notify you half an hour before it is due to start raining. This kind of application leverages the incredibly reliable- and specific weather data available from Meteomatics, and is certainly a promising one.

Alternatively, you might be heading to a specific location and want to know what the weather will be like when you get there. Again, Meteomatics data is perfectly suited to this problem; the limiting factor here is the AVS VUI. Because of the range of different address formats worldwide, it is difficult to create a data type which would capture this information in a slot. Amazon have implemented a built-in data type for addresses in the United States, but taking this capability worldwide would depend on similar types being developed for the rest of the globe.

Less specific locations are easier to implement. For instance, in the WeatherWhenWhere intent, I require that the user provide a city and a country12. Here, however, the high resolution of Meteomatics data becomes less relevant, but our API has another strength which is useful in this context, namely, the ability to quickly process large quantities of weather data. In the backend for WeatherWhereWhen I actually do this by requesting temperature information over a large area and taking a spatial average. This kind of processing could potentially be useful when packing for a holiday, from simple queries such as “what’s the average daily temperature in Barcelona going to be like next week” to more complicated ones like “will the conditions13 in Banff be good for skiing on Monday”. By combining the vast array of parameters available from Meteomatics with functionality loaned from other APIs, there’s no reason questions as complicated as “what is the best town in northern Italy with under 200,000 people living there in which to develop a real estate project?” can’t also be answered. This kind of question seems oddly specific and not something we’d be likely to program explicitly in a skill, but gives an idea of the scale of the questions which can be asked of- and answered by the Meteomatics API.

There are many more steps on the road to a complete, official Meteomatics Alexa skill. Hopefully this article has given a flavor of the kind of exciting possibilities which stretch out before us. Watch this space for the continuing development of VUIs for Alexa and other voice enabled devices, and of course, if you have any feedback about what you’d like to see next, we’d love to hear from you.

[1] Watch this space for updates on our experiments with other voice user interfaces in future.

[2] The voice user interface; the backend refers to all the code which handles the speech interpreted by the Alexa frontend

[3] Alternatively you might think having separate ‘weather’ and ‘climatology’ skills makes more sense – there’s nothing prohibiting the same Meteomatics API being used in multiple skills after all.

[4] Only one utterance so far, but I’m sure you can see the need for another which changes the order of the slots, as well as many other possibilities

[5] A number of useful Python examples can be found on GitHub.

[6] To register a one-word invocation name, you need to demonstrate proof of your intellectual property rights. Since this skill is in development, I’m using this invocation name so that Alexa responds to speech which sounds like ‘meteomatics’.

[7] Quite possibly because I haven’t coded it up correctly…

[8] Actually provided as a latitude and longitude and converted to an address using Google

[9] Of course, a forecast this far in advance is subject to a lot of change, and wind gust information is not available

[10] There are several options for server location in a dropdown menu, and it’s not always obvious which of these your skill will be communicating with: if you can’t see logs from your latest session, try the other locales

[11] For instance, a Meteomatics API response to the Python connector is usually a Pandas DataFrame, which I had to manipulate to get the numbers I thought a user would be most interested in.

[12] This is actually far too general for some locations – there are, for instance 91 Washingtons in the USA

[13] You could design your own interpretation of various Meteomatics parameters for this intent, or use one of the leisure indices available directly through the API