Voice-enabled-everything is a dominant theme at any tech show. We have moved on from mere voice assistants provided by Google and Amazon to voice-controlled vacuum cleaners, showers, heating and lighting – and even the Alexa-enabled toilet. Our cars aren’t immune either – many are coming equipped with voice-powered attributes. Now, for instance, you can dictate emails and text messages through your car as you drive without even taking your hands off the wheel.
While this is exciting and a potential time-saver, as more devices acquire voice-control, there is a fundamental need for natural language processing (NLP). Who wants an email dictation system that routinely sends erroneous messages because it doesn’t understand the nuances of what you say? When it comes to voice bots and interactive applications, such as Google Duplex, the need for an accurate understanding of language and conversation is essential.
Without NLP we’re talking but not fully understood
Currently, the most seamless and useful applications are driven by turn-taking dialogue systems that enable the end-user to speak or type a sentence, which is digested and pulled apart to determine the speaker’s intention. There are also early attempts at conversational dialogues where a speaker can continue to ask questions or issue commands with the illusion of a more natural back-and-forth conversation taking place. But the reality is this is often not much more than simple context being remembered between dialogue-turns within a predetermined decision tree.
HR, chatbots and robotic process automation: key for digital transformation
It works both ways – NLP moves us toward truer dialogue
NLP is now moving away from these binary-tree-based decision models and evolving the capability to detect multiple layers of intent and sentiment along with recall of long-term context. Simply put, NLP opens the door for more human-like conversations with bots, which don’t just react to longer sentences but bring in memory recall.
Think about dining at your favourite restaurant. If you were to sit down and there was no menu on the table, how would you order? What if you asked your waiter several questions and each time the response was: “Yes, we have that. Is that your order?” or “That’s not an option.” Or, what if you ordered steak and the waiter immediately gave you a full rundown of why the salmon is a great choice? To say you’d be frustrated is an understatement.
Six surprising ways businesses are impacted by RPA, OCR and NLP
Traditional menus are guessing games like this. Unless the information is explicitly stated, you don’t know what the alternatives are. Sometimes however, when the alternatives are stated, there’s way too much context that’s not relevant to your specific need. For example, you don’t need a detailed description of the side dishes available with the chicken if you just ordered a salad.
With NLP, an organisation could create adaptive menus that simply ask: “How can we help?” What comes next is driven entirely by your responses and needs. You could simply say: “I want the steak cooked medium with broccoli, please,” or “Do you have any good seafood?” The system understands your response and determines the next action. That could mean confirming the order or taking exploratory action, like asking further questions to determine which seafood you like. The context of the conversation is driven by your responses and needs – it is not mandated by a rigid menu.
Feelings, expressions, emotional intelligence and affective computing
Ultimately, our perception of what people say in a conversation is often influenced by tone, facial expression, body movement and speed of speech, which is where NLP currently has limitations.
Affective computing could augment NLP and its ally, natural language understanding (NLU), so that systems and devices can recognise, interpret, process and simulate human affect. Facial expressions of anger, disgust, contempt, fear, sadness and surprise seem to be almost universally recognised. We are yet to see technology present a human’s feelings and moods – much less quantify this into unique data points or profiles. With affective computing in the mix, however, digital interactions can be humanized, made more conversational and less one-sided.
AI and OCR: How optical character recognition is being revitalised
Think of all the times we ask questions such as: “Don’t you think what Jack said was ridiculous?” We are not really interested in the “yes” or “no” answer but are stating an opinion that seeks a response about either Jack, the feasibility of what he suggested or the situation giving rise to it.
We go back a long way – remembering context is key to the future
To precisely understand even a simple statement, we need not just an accurate interpretation, but contextual awareness across the entire conversation and sometimes those preceding it. That’s where affective computing comes in, enabling a bot or voice application to “remember” the previous dialogue and understand how it relates to the topic currently under discussion. Although machines can interpret questions based on word-by-word understanding, this is not how humans naturally discuss a subject that matters to them. When it comes to figurative speech or when emotions modify the meaning of sentences, machines require more than semantic and sentiment analysis.
NLP-enabled natural conversations on every device
That’s why the future of this technology will see NLP evolve to multi-intent and speaker emotional (sentiment) detection with long-term context recall. That will advance to even more natural conversational interfaces that not only react to longer sentences but can bring in memory recall. In the not too distant future, NLP will be integrated into personal bots that live on all our devices. The NLP interface in your self-driving Uber, for example, will simply ask for your destination as you get in. And less and less of us will need a second language because of real-time translation capabilities.
Written by Tibor Vass, Global Director of Solution Strategy Business Automation at Genesys.