top of page
  • Writer's pictureGidi Shperber

ChatBots vs Reality: how to build an efficient chatbot, with wise usage of NLP

Updated: Oct 7, 2019

Part 1: The Chatbot Paradigm

This article is part of “NLP for chatbots” series:

1. The Chatbot Paradigm (this post)

2. How To build a chatbot using rasa NLU

3. How to build and improve a chat bot using Rasa Core

4. Going deeper: adding components to Rasa NLU

I started writing this post as standalone, but quickly it went out of control… So: this is part 1 in a 4 post series. It will discuss the common paradigms for designing a chat bot. Parts 2,3 and 4 will show a practical example of building such a bot.


After some time of researching the possibilities that are enabled by Machine Learning for chat bots, I recently got to work on a massive chat-bot project with Interactbot. Therefore, I’ve decided to write a series of posts and discuss and demonstrate what are some of the abilities and limitations of NLP in chat-bots.

From my point of view, along with the image problems, the text understanding is one of the two top tasks in machine learning nowadays (top is a bit vague… perhaps in terms of traction, effort and interest). So it was interesting for me to seriously tackle one more interesting and unsolved problem.

However, while image tasks are impressively solved one after another, the text tasks are a bit behind: A really good conversational skill, which is required for developing a proper chat bot, is not even close to solution.

You can read about the (somewhat unripe) state of a few selected NLP problems in this article, which will be discussed in later posts

So you may be thinking to yourself: “If NLP has so many challenges, why should anyone invest in a chatbot?” To some extent, you may be right. Even machine learning power-houses like Apple, Amazon and Google often fail miserably with their conversational interfaces (see an example below). Seems like Machine learning is yet to have an answer for this kind of problem. So why should anyone bother to work on chat-bots for commercial reasons?

The reason is, as in most machine learning based products: although you can’t do everything perfectly, but you can do something helpful. For example, you can’t build a chat bot to discuss the meaning of life, or a bot to help with some complex problems, but one can definitely build a chat-bot that will answer basic internet provider support questions, and with wise product design, will save a lot of time spent on tiring phone menus.

So in this post, I will discuss some of the chat-bot capabilities, and some of the ways we try to close the gap between the machine learning research and production.

The capabilities and problems with Chatbots: where to aim?

As said, full conversational skill by machine learning are significantly far from us, bots are still useful. Think of Siri, Google Home and Alexa, which are a few very popular chat-bot platforms: while not perfect, they are considered very good products by many people. you can use them to, among others:

- Set a meeting

- Order food

- Order a taxi

Chat-bots can be also very useful for easy conversational tasks, like (basic) customer support, content discovery, or as more intelligent search engine and more.

Now think of the last time you were talking to a support representative, explained him your problem for the 1000th time, and got an answer which he was repeating for the 10K time.

There are many monotonous tasks that could have been replaced by a basic conversational skill with some dozens/hundreds prescribed answers.

Lacking the ability to solve the “conversational skill” problem, requires the chat-bot builders to be creative, and design a pipeline of tasks, which when combined with some business rules and search heuristic, may yield some useful chat-bots.

Intent-Entity paradigm

Hopefully you are able to see the potential in chat-bots, in-spite of the possible flaws. Now lets discuss how to build one.

Chatbots and NLP

Although I didn’t say that explicitly, a chat-bot does not necessarily have NLP in it. If on one end of the chat-bot-scale there is the “full conversational” bot, that has human-like conversational skills, on the other hand there is the Deterministic bot, with predefined conversational tree (usually 3–4 layers deep) based on many if-else statements, and “like %x%” SQL queries

Starting from this basic mechanism, NLP may be useful in the following parts:

- User input/question classification

- Better word/entity recognition

- State recognition (where are we in the tree)

- Answer generation

Intent-Entity paradigm explained

After making some research, I’ve found out that a good way to go is the Intent-Entity paradigm, which as implied by it’s name, works with two steps: Intent classification, and entity recognition.

We assume that we know where are we in the conversation flow, and ignore state, memory, and answer generation, which some of will be discussed in the next posts.

This paradigm facilitates design and training work of bots but not technical people, and used by most of the well known chat-bot interfaces:,, etc.

This concept may not be considered as a per-se NLP task, but a pipeline of NLP tasks. Intent classification is related to text classification with different starting conditions, and Entity recognition is parallel to Named entity recognition tasks, different conditions apply here as well.

Lets look at an example. An example of a restaurant search bot, which is loosely based on the basic example here. This bot will have the following possible capabilties:

- Restaurant search — which means the user looking for a specific or a list of restaurants

- Table order — the users wants to order a table in some restaurant

- General query — the user has a specific query regarding a restaurant, e.g if it’s vegan friendly, or if it’s Kosher.

Additionally, we would like to find the following objects in the user query, if were mentioned:

- Cuisine — will represent the cuisine type, e.g Italian, Asian, etc

- Location — location of the desired restaurant

- Attribute — attributes of the restaurant, e.g vegan, kosher, accesibility

Intent classification

In this paradigm, intent means the general purpose of the user query, e.g searching for a business or a place, setting a meeting, etc.

The bot should categorize your query, and act accordingly (for searching a place, getting as much details as possible. For setting a meeting, requesting the meeting details, attendants etc.) So it’s easy to see that this is a plain text classification challenge.

Text classification is a well studied machine learning task, however, a big part of the research is conducted on lenient problem settings, such as sentiment analysis. In real world bots, you almost never have fewer than 5 possible intents.

Without being too specific, the accuracy of such a model is dependent on various parameters:

- Intent count — average number of intents for one app should be 5–10 intents. Fewer intents will be to simplistic, while more intents will harm the accuracy.

- Data Magnitude and quality— we all know that as in any machine learning task, the more data we have, and the closer it is to inference queries, we will have better results

- Transfer learning possibility — transfer learning, or in other words, using a pretrained model for similar problem may be very helpful if available

- Inference input size — users don’t tend to be concise in their querying of our bot. therefore, text summation tools, among with cues for users to be short and to the point, will help our app to have better accuracy

If we were able to “optimize” the above hyper parameters, we will usually be able to reach around 80% of accuracy without too much effort, and will start striving toward the 90% which is challenging, but may be comfortably considered as production ready. Results lower than 80% will result in a frustrating product.

Entity Recognition

Entity in text may be a business, location, person name, time etc. An object that has a meaning in the query, and will have further meaning in the bot logic.

The entity recognition is a also well-known NLP problem by itself, and it is one of the annoying ones: it is very strongly dependent on datasets and heuristics (e.g capital letters, question marks).

There are many some libraries that address this task, e.g my favourite spacy, which does a pretty good work.

Here you can see one of spacy creators discusses their NER in depth.

Rasa NLU

Mentioned earlier, Rasa NLU is a good open sources library for finding intents and entities in text.

When approaching machine learning problem in an organization for the first time, it is a good practice to use a ready-made open sourced tool, and then to build a good infrastructure around it.

Rasa NLU claims to give you exactly what the paid/black box libraries (mentioned earlier) give you, and more. I must admit that I’ve only conducted some basic comparisons, but as you will see, rasa NLU results are pretty good objectively.

Furthermore, being open source, allows you to look in the code, and understand what methods are being used (and perhaps get a bit disappointed when you see the simplicity of the models) and later develop extra components by yourself (this option is possible but not document in Rasa)

Additionaly, organization sees high value in in-house system, and open sourced rasa-NLU provides the ability to take it as a basis, and develop more capabilities on top.

At first, rasa NLU seems a bit like a black box: train a model with small dataset in a specific format, and then infers intents ands entities.

I must admit that Rasa’s documentation may be quite confusing some times, but a few hours of thorough examination of the code will reveal most of it’s “secrets”.


Rasa NLU is fed with queries, tagged with intents and entites in the follwong format:

  "text": "show me chinese restaurants",
  "intent": "restaurant_search",
  "entities": [
      "start": 8,
      "end": 15,
      "value": "chinese",
      "entity": "cuisine"

As with all machine learning problems, the more data you have, the better model you get. However, some of Rasa’s components might be very slow, and very limited in terms of training examples. from the other hand, reasonable results start to emerge even with a few hundreds of examples.

Base models

In general, Rasa uses two “lnaguage models” interchangeabli — MITie and Spacy, additionally with the ubiquitous sklearn.

Mitie and Spacy are very different libraries from each other: the first oneuses more general-purpose language models, and therefore very slow to train, while Spacy uses more task specific models, and is very fast to train.

In Interactbot, we first started using MITie for technical reasons, but quickly moved to Spacy due to it’s training speed. the results of both packages were pretty similar.


As said, RASA is based on NLP task pipeline. The pipeline is not necesserly linear, and different components output different things. A (pretty deficient) documentation may be found here.

As you can see in the documentation, it is possible to manually assemble a pipeline, but it is recommended to start with one of the predefined pipelines. lets examine the Spacy-SKlearn pipeline:

["nlp_spacy", "tokenizer_spacy", "intent_entity_featurizer_regex", "intent_featurizer_spacy", "ner_crf", "ner_synonyms", "intent_classifier_sklearn"]

There is a bit clutter in this pipeline, but the importent components are the intent_classifier_sklearn which uses an SVM on sentence features (based on word2vec) and the ner_crf and ner_synonyms which predict the entities. The will be further discussed in the next post.

The above components are fed with features by the “intent_entity_featurizer_regex” (regex features) and the “intent_featurizer_spacy” (word2vec features).


The output of running a Rasa model on text will be the following prediction, that should be further rolled to businesss logic and databse querying, to provide the right answer for the user:

    "text": "I am looking for Chinese food",
    "entities": [
        {"start": 8, "end": 15, "value": "chinese", "entity": "cuisine", "extractor": "ner_crf"}
    "intent": {"confidence": 0.6485910906220309, "name": "restaurant_search"},
    "intent_ranking": [
        {"confidence": 0.6485910906220309, "name": "restaurant_search"},
        {"confidence": 0.1416153159565678, "name": "affirm"}


In this post, which is the first part of the series, we’ve went over the intent-entity paradigm for chatbots. We got ourselves familiar with the Rasa NLU package, and some of it’s models.

In the following post, we will learn how to actually build a conversational interface with Rasa NLU and other tools. We will also discuss the evaluation and improvedment of the models used.

In the last post, we will integrate the new package — Rasa Core, which will allow us to add new features such as memory, graph inference, and more cool features.

Hope you’ve enjoyed this post! if so, feel free to follow me here or there: Twitter, LinkedIn

special thanks to Harry Hornreich and all the guys in Interactbot

110 views0 comments

Recent Posts

See All


bottom of page