Several months ago, I wrote a blog post about how we created a voice skill for Alexa enabled devices. Voice assistants have surged in popularity over the last decade, mostly due to advances in artificial intelligence. Specifically, in the domain of Natural Language Processing (NLP) – a hybrid discipline of computer science and linguistics – the advent of neural networks has made it much easier to train an AI to interpret, process and respond to human speech. In today’s article, I’m going to discuss my work with the equivalent service from Google and the software that powers it – Dialogueflow – and why this is tool is significantly different from the service provided by Amazon. First though, for a bit of context…

A Brief History of NLP

Although the conversion of a user’s speech to text registered on a computer, and the subsequent text-to-speech response by the computer to the user are technological achievements in their own right, it is the processing of the text in between which has historically been the most challenging of the problems involved.

chit-chatWe can think of the earliest attempts at NLP as essentially complicated flowcharts. Given a topic of conversation, it is assumed that there are only a certain number of questions which can be asked, and only a certain number of answers to each of these. This is, however, a gross oversimplification. Although humans are completely capable of selecting the best matching option given their intention, this interaction is highly robotic and not at all ‘natural’. Furthermore, thinking of every possible response to a question is an incredibly difficult task. Quite often, such approaches don’t even come close to doing so – think about your last ‘conversation’ with a telephone banking menu, for example.

Consider how ‘predictable’ your average conversation is: even if you have a good idea what the overall meaning of your partner’s next sentence is going to be, just try to guess each word they say in order before it comes out of their mouth. Clearly this is unfeasible, and so the approach was abandoned in the late 20th century, whereupon NLP shifted focus onto a statistical comprehension of a corpus. Machine learning [1]algorithms could ingest a huge number of prior examples of conversations and determine some likely combinations of key words which map to an intent.

This approach, based heavily in a Chomskian interpretation of what language is and how it works, is still largely in use today. The major upgrade in the last 10 years has been the application of neural networks to the problem. Neural networks are a subset of machine learning algorithms, but are essentially black boxes to the uninitiated[2]. However the results are impressive: where a typical machine learning algorithm might require a corpus of thousands of books[3], a neural network might be able to achieve similar results with just a handful of sentences.Noam Chomsky

In fact, voice assistants and NLP AIs are still a long way off emulating human interaction completely – imagine, for instance, asking your device two questions at once (and expecting an answer to both of them). But the idea that language is learned from experience, rather than from a template, and the ability to translate this idea into computer code, means that we can at least approximate a conversation with our phones today.

How Is Dialogueflow Different?

What the above history hopefully makes clear is that NLP is the filling in the voice assistant sandwich. What readers of my previous blog post might be wondering though is: why are you doing this again?; surely the Alexa article covers what we need to know?

The answer is that, because of the way that Alexa skills are developed, it’s much less straightforward to deconstruct the sandwich in case you have a speech-to-text intolerance and want to put the filling in between slices of pop-up messenger instead. Because Google developed Dialogueflow fully independently of their voice recognition software, the AI which manages conversations can be integrated into third-party software. Sure, you can use it with other telephonic products, such as Genesys and Twilio, but you can also use it to build an entirely text-based chatbot with Facebook Messenger and Slack, amongst others, or create a Twitter bot, all using the same software.

How Does It Work?

With the exception of a few changes of nomenclature, the overall recipe for a Dialogueflow ‘agent’[4] is very similar to that of Alexa: an ‘end-user’ makes an ‘expression’ (utterance), which maps to an ‘intent’, which may or may not have ‘parameters’ (slots) of given ‘entity (slot) types’.  Here I’ve put the nomenclature used by Alexa in brackets after each differently named term – I again direct interested readers to my previous blog post, or simply to the Dialogueflow documentation, which is easy to follow (at least as far as fulfilment – see below).

There were a few things I particularly liked about working with Dialogueflow as opposed to Alex. For one thing, the quickstarts were laid out in logical order and were for the most part easily intelligible. I also like the additional distinct ‘follow-up intent’, which allows the designer to allow certain intents to only be accessed after a preceding intent has already been triggered, making it much easier to organize intents without them overlapping. I also appreciate Dialogueflow’s straightforward interface for remembering parameters that the user has provided, which was challenging to achieve in Alexa.

Fulfilment

The main difference – and for me the biggest challenge of Dialogueflow – appears at the point at which we need to interact with an external information source (in this case the Meteomatics API). In the case of Alexa, all we needed to do was write Python code in the Code tab of the Alexa Developer Console. In Dialogueflow, things are slightly more complicated. Instead of calling the API directly, we have to do so via a ‘webhook’.

“What’s a webhook?” I hear you cry. Good question. As I understand it, a webhook is essentially an application left listening at a given URL. It is hence contactable through the internet, and acts as a middle-man between the user (Dialogueflow) and a third-party (in our case the Meteomatics API). Why you’d want to do this is anyone’s guess, but it’s the Dialogueflow way of accessing an external API, and who am I to question the essential wisdom of Google…

Luckily for you webplebs, I’ve included some sample code of how I made a webhook in Python to retrieve weather data from the Meteomatics API. I’ve limited it to this trivial example as you’ll undoubtedly want to play around making your own, but contact me for more examples, or check the Meteomatics GitHub, where I’ll be uploading some soon.

from flask import Flask
from markupsafe import escape
import meteomatics.api as api
import datetime as dt


username = 'XXXXXXX'
password = 'XXXXXXX'
app = Flask(__name__)


@app.route("/api-call/variable:<variable>", methods=["POST"])
def api_call(variable):
    lookup = {
        'temperature': 't_2m:C'
    }
    var = lookup[variable]
    df = api.query_time_series([(47.43, 9.4)], dt.datetime.now() - dt.timedelta(hours=24), dt.datetime.now(),
                               dt.timedelta(hours=1), [var], username, password)
    t = df['t_2m:C'].iloc[-1]
    return f"It's {t} degrees Celsius"

I wrote this after following the quickstart in the documentation for flask (the Python package used for creating web apps). I’d recommend having a read of this, since there is a bit more to running a webhook successfully than just writing the Python script; specifically, you have to run the app from a terminal.

Terminal running Flask app

Once we’ve configured the webhook to fetch API data and process it according to our request, all we need to do is click to toggle Webhook to ‘Enabled’ in the Fulfilment tab of our Dialogueflow Agent and point it at the URL of the webhook we created. For the purposes of experimenting, I’ve been running my flask app on the LocalHost of my computer – meaning it is invisible to computers outside my network.

Webhook Enabled

So that’s more or less all there is to it! The two things I didn’t cover in today’s post are the hosting of the webhook on a publicly visible server and the integration of the completed app into a third party tool. Keep your eyes peeled for these developments, which may be discussed in upcoming blog posts. In the meantime, have fun playing with Dialogueflow, and feel free to contact me on [email protected] to ask any questions or share any successes you have with the tool.

Happy interfacing!

[1] Algorithms which allow a computer to learn right- and wrong answers without being explicitly programmed

[2] Although this video series does a very good job of explaining the fundamentals

[3] The Gutenberg Project – a free corpus often employed in beginner NLP machine learning tasks – contains about 60, 000 books

[4] So named because Dialogueflow wants you to think of each service you build as akin to a call-center agent: “You train them both to handle expected conversation scenarios, and your training does not need to be overly explicit”