The dream of building computers or robots that communicate like humans has been with us for many decades now. And if market trends and investment levels are any guide, it’s something we would really like to have. MarketsandMarkets says the natural language processing (NLP) industry will be worth $16.07 billion by 2021, growing at a rate of 16.1 percent, and deep learning is estimated to reach $1.7 billion by 2022, growing at a CAGR of 65.3 percent between 2016 and 2022.
Of course, if you’ve played with any chatbots, you will know that it’s a promise that is yet to be fulfilled. There’s an “uncanny valley” where, at one end, we sense we’re not talking to a real person and, at the other end, the machine just doesn’t “get” what we mean.
For example, when using a fun weather bot like Poncho I may ask, “If I go outside, what should I wear?” The bot responds, “Oops, I didn’t catch that. For things I can help you with, type ‘help’.”
Yet, when I ask, “If I go outside, should I take an umbrella?,” the bot’s almost too-clever response is “Nah, you won’t need your umbrella in Santa Clara, CA.”
The problem is that, at least so far, when it comes to language understanding, computers just can’t do what humans do. As adults, we already cope with our native language very, very well. We solve ambiguity at lightning speed, we cope with pronouns referring to previous names in a conversation, and we can identify where a word or phrase begins and ends. Computers just can’t do that, at least not in a way that is convincing to most users.
If machines are to be truly intelligent, this gap in language-based communication needs to be solved. After all, if we don’t have true language capability with machines, we don’t have true artificial intelligence (AI). It’s fair to say that language understanding by machines is the holy grail of AI, and whichever company cracks this is on the path to unimaginable riches.
Machine learning and neural networks are being applied with ever-increasing frequency to the problem of NLP. This approach does produce useful applications, often referred to as AI. However, it doesn’t solve the problem of engaging in meaningful conversations or human-computer-interaction, which means the “intelligence” expected from AI is still missing.
Still looking for meaning
We are quickly reminded of the limitations of language applications in the case of machine translation. Recent advancements to Google Neural Machine Translation has improved their phrase-based production system, they say, by 60 percent when compared side-by-side to human translations, and it is noticeable in performance. Yet simple sentences like “The dog that ran past the barn fell” still miss the mark when translated to Chinese and back (although the result, “The dog ran past the barn,” is getting close).
Other companies aspiring to create AI systems that can read and comprehend large volumes of complex text in real time, like Microsoft’s recently acquired Maluuba, are spending resources on creating data sets of human-produced questions and answers corresponding to thousands of news articles. Their goal is to help researchers create the right algorithms for the computer to reason satisfactorily and answer questions correctly from other text.
The issue with these approaches is that meaning is left out of the equation. To put things bluntly, processing larger and larger vocabularies, while analyzing how often words occur together in large data sets, gives information about usage patterns in a language community, but it really doesn’t give any definitive information about what any particular phrase or utterance means. Since with language we need to know “what does THIS particular phrase actually mean, right here, right now,” any system that fails at this level truly hasn’t solved the problem of natural language understanding (NLU).
If I say to you, “Clock say one-la, you me hungry, eat go now together,” I’m sure you’ll agree that yes, it’s lunchtime, and we should go eat now because we’re hungry. However, a machine that has been trained with learning algorithms applied to corpora will come back with a result showing that the above sentence is extremely unlikely, and therefore should be rejected or ignored. In such a case, a bot with a learning algorithm will respond: “I don’t understand, please press help.”
This is an extreme example, designed to make a point. But fundamentally, if we’re going to teach computers to understand what we say, and respond to that understanding rather than to a frequency distribution of words from corpora, a different approach to teaching human language to computers is required.
Learning like a human
Rather than trying to learn patterns from big data, we probably need to revisit language acquisition in humans to solve the NLU conundrum. By understanding how humans acquire and manipulate language to communicate meaning, we can create emulations of this process.
Broadly speaking, an adult can operate satisfactorily with only 4,000 words in their vocabulary. The magic of daily conversation and communication comes from the way these words are combined, regarding real objects, people, processes, and events in context.
This outcome is not achieved by sticking a person in a room with thousands of books and asking them to learn the colocation patterns and the probability of each one. Rather, each one of us, independently and in our own time, learns language by anchoring every word and phrase that we meet to meaning. A word has meaning for us when it has a matching experience in our brain that also maps to something in the real world. “That hurts!” makes sense because we feel pain, and have experienced pain. The word “hurt” is a sound label — audible — that represents the experience of “hurt.”
Interestingly, even grammar can be seen to map directly to meaning. We understand “I was hungry yesterday” is different from “I am hungry now,” because in our minds we sense “was” as being behind us and “am” as being a space (time) that we occupy at this very moment. In a very real sense, we use elements of grammar to “feel/sense/know” the experience being communicated. Grammar itself is not some abstract thing, but an important part of how we directly experience meaning through language!
To teach computers conversational skills and language, we need to connect language to some form of meaning representation in the computer. Only then do we have the possibility of achieving true AI and human-like language interactions with machines.
The process of understanding language then becomes one of identifying which words and phrases map across to which meanings that are “known” to the machine. And machine learning of language then becomes a process of discovering (or being told by a human) what certain things mean.
Creating conversational models
Silicon Valley has become a hotbed of companies trying to solve for this challenge — some are actually employing the approach I’ve laid out here, such as Palo Alto, California-based Pat Inc., which has developed an approach that allows phrases to be directly connected to meanings that are stored in a universal meaning-layer dictionary within its NLU API.
Wit.ai is the NLP-NLU platform Facebook acquired to power its Messenger platform, and it has since released a toolset that can be used to train the platform in new conversation models as well as monitoring the interactions between users and the platform.
The future is promising, and we’re already experiencing astounding results from early stage Valley-based companies. In this way, I anticipate great advancements over the next several years in teaching machines human language, as long as we collectively continue to focus on how humans approach language.