Investing in AI: When natural language processing pays off

For the past 18 months, my teams at Acxiom Research have worked extensively with a specific form of artificial intelligence called natural language processing (NLP). Our most exciting NLP development is called ABBY — our first artificially intelligent employee. But I'm not just here to talk about ABBY. I'm here to talk about the potential of NLP and how to decide if it's a technology your own company should be exploring.

I want to leave you with two thoughts about NLP:

First, the open source technology around NLP is so robust you can easily build "on the shoulders of giants" and create amazingly effective NLP applications right now using just a small, highly-focused team and a platform approach.

Second, even with such a large amount of powerful technology at your fingertips, creating a front-end NLP (one that “talks back,” which is what most people think of when they think of AI) requires both vision and fortitude. Vision to see the power of the technology and sell it to your internal stakeholders. Fortitude because it will require a significant up-front investment before you see returns from some of the more advanced capabilities you need to develop. You must also be willing to learn the skills of a consumer marketer and deal with issues of changing behaviors already entrained in your user base.

Backend NLP is easier and provides a more immediate ROI

NLP-based improvements to your business need not have a conversational front end. These backend-driven or linguistic analysis projects often offer the fastest, most cost-effective, highest-return way to use NLP in the short term. These projects involve teams of two-to-three people working for a few months to complete.

Hilary Mason, GM of Machine Learning at Cloudera presented a good example of backend NLP in a keynote at the most recent Strata Conference. Mason explained how Cloudera lowered its call center costs and improved customer satisfaction using NLP. They took a statistical sample of recorded calls from their call centers and transcribed them to text. They performed textual analysis on this corpus, seeking speech patterns tied to specific issues and problem resolution steps. They then deployed predictive models based on the results of this analysis into their call center systems. When a customer called, the underlying algorithms identified patterns of speech and proactively recommended a likely solution to the customer service representative as they were speaking with the customer. The result, Mason said, was reduced calls to the call center as well as increased customer satisfaction (my team saw the same type of positive results in our own similar project).

Conversational NLP costs more and requires long-term vision

Once you focus on conversational NLP (or AI), where you want the machine to interact with a human in a way that has something even vaguely like the fluidity and imprecision of normal human speech, the problem becomes technically challenging and expensive. I am not speaking here of chatbots. A chatbot is a very simple machine that can follow a relatively structured conversation for a specific task and sits in certain pre-defined environments like Facebook Messenger. Conversational AIs are completely different. Similar to Alexa, they are ubiquitous (they are wherever you are), can handle multiple applications (also called intents), and can deal with the wide range of responses even one person can give to the same statement. They can also change contexts rapidly -- say from providing information about today’s weather to making restaurant reservations.

Multiple open-source platforms already exist to allow your teams to build a functional, if imperfect, AI in a reasonable time frame and at a cost that provides a positive return. Companies like Apple, Google, Microsoft, and Amazon have each poured literally hundreds of millions of dollars, and the efforts of some of the brightest PhDs on the planet, into advanced NLP interfaces. These open source libraries allowed us to build a foundational platform for a simple conversational AI in about a year, with a team of 3-4 people, for approximately $500,000. That early platform has a few simple intents, no pre-conversation awareness of the user (since that requires an interface with secure systems), and no memory of prior user sessions. From there, depending on the complexity of the intent, we have been able to deliver each new function for between $10,000 for a simple intent (e.g. weather) and $25,000 for a more complex intent (e.g. conference room reservations).

We view the platform as an investment to be spread across all apps built in a two-year payback period. Since we expect to add 48 new intents over that period, amortizing the platform adds $10,000 to the cost of each intent. That is one way we cost-justify a new intent. For example, allowing people to self-service on a lost/forgotten password or other simple IT issues saves the time of at least one IT person a year. From some quick calculations using the IRR function in Excel, and assuming that role costs $100,000/year, the quick calculation of the single-year ROI of that “complex” app at ~260 percent, which makes it worth doing. Cost is only one factor we use in prioritizing which intents to build, and sometimes we invest even without a strong ROI. But we do use it as a guideline.

The following table provides an overview of some open-source tools worth looking into.

Adoption by end users and developers

A conversational AI platform needs two forms of adoption to succeed. One with end users and, equally important, one with developers.

Achieving adoption of any new technology by a majority of end users is an arduous process. It is particularly difficult when users are reluctant to give up existing tools and ingrained behaviors. Purveyors of new technologies to consumers are well aware of this. They build a substantial adoption curve and associated marketing budgets into their business models. But developers and even product managers at many companies, especially those in B2B markets, have little experience with consumer adoption curves. They don’t factor it into their plans and, equally important, their managers don’t understand that curve either. There is very little patience or capability in many organizations for the kind of persistent messaging and salesmanship needed to gain widespread adoption of conversational interfaces. The result is that many front-end NLP projects never achieve adoption, which limits further investment.

My teams overcame this challenge with our ABBY project by treating the deployment of ABBY’s intents like any other typical new product marketing problem -- we assigned a part-time product marketer to develop and execute marketing programs for internal adoption. We also developed a group of early adopters/beta testers who understand that part of their role is to promote the new intents to their peers in the organization. Lastly, our entire team is tasked with selling ABBY’s capabilities wherever we can when interacting with people in the organization. Just like in any standalone small company, everyone on the team is a salesperson.

But no matter how well you execute on internal marketing, front-end NLP is still a long-term evolution, and both the end user’s behavior and the capabilities of the AI are going to evolve over time as developers, the AI, and end users interact. It is for this reason that it is critical to develop an NLP platform for developers across the organization to use. Just as in an open marketplace, no one group can conceive of or build all the apps that may be important to the other various users or groups in your company. One way to enhance adoption is to have lots of teams building NLP apps for the conversational front end. Thus, developer adoption is a second critical element in the adoption cycle. We use many tools to promote adoption. We actively reach out to developers through team meetings, one-on-ones, and an NLP Special Interest Group. We also have NLP projects available for our regular quarterly hackathons.

Efficacy and task-oriented design

This brings us to another design issue -- efficacy. The intents to invest in are those that make an existing experience more effective, more efficient, or both. If it takes longer to do something conversationally, people will not use your AI. This is especially true where there is an ingrained behavior and significant, conscious extra effort is required for the end user to shift behavior. In our case, our phone directory project was a good investment because it was previously time consuming and inconvenient to get a person’s contact information from our internal systems. Once people used ABBY’s directory intent a few times, they began to switch. The same is true of room reservations. But when users were able to perform Google searches from within ABBY, we got very negative feedback. People thought we were silly to invest in an app when they could just switch to a browser and do a search that provided more robust information content in a format they understood.

Where is the killer app?

A question I often get: “Where is the killer app?” The one area where conversational AI is making substantial inroads is customer service. But customer self-service is an instance of a broader class you can think of as diagnostics. That class of problems may define what can or cannot be a killer app for conversational AI. The question to ask with task-oriented users is, “When do they want or need to talk at length to an AI to accomplish a goal?” The answer is two-fold. One element is where the resolution of a task requires many back and forth interactions between the user and the “helper.” The second is when many words are needed because the item to be described is inexact, so the user is trying to string together a “close enough” description for the listener to guess at the actual item. Computer service is a great example. Buying a complex product like data via an online interface is another. A third is research and tabulation of information from data, which can be thought of as the “diagnosis of data” to determine an information outcome. In all these cases, end users must engage in a “ranging exercise,” where they start with a broad concept or set of possibilities and, through a series of interactive steps, restrict the set of possibilities until a final result is found or set of conclusions is reached.

The reality is, however, there may be no killer app. Very few apps are used by everyone. Given that such a universal intent as a phone directory requires promotion, imagine how much harder it is to gain adoption of intents focused on a single set of users. The analogy is mobile phones. There are very few universal apps in mobile. Most people use 10-15 apps. But the exact 10-15 are unique to each person. App downloads through the app store have a short “head end” and a long tail. App use is very idiosyncratic. It is very similar with apps within an organization with the caveat that the individual’s role has a very strong correlation to which apps they are most interested in. This is why having a platform, and adoption by developers, is so critical. Each department may have its own “killer app” that its end users will adopt gladly, and it is the developers living in that context who will see the need most clearly.

User experience

AIs get one chance to make a first impression. User experience with AIs is one of the most critical factors in adoption and one least appreciated by those not expert in building AIs. Once again, we come back to rules of consumer product design: You get one shot to make a first impression with consumers, who generally have little tolerance for buggy or incomplete functionality. Too many teams without direct experience building products for consumers release a buggy MVP, thinking users will forgive the interface for the better functionality. The opposite is true. Many AI projects die because development teams don’t take the interface far enough on all the deployed platforms (i.e., mobile is very different from desktop) before releasing it. Users have a poor experience and never come back. Ensuring a good experience can be something as simple as the project manager putting him or herself in the shoes of a new user, running through all the basic phrases that someone might use on the AI and fixing those it doesn’t understand before release.

In my mind, the single biggest reason AI projects fail is because their creators do not do enough training and conversational curation prior to release.

Have a human-like interface, but not too human. AIs do not need to have the same capabilities as people. As I am constantly reminded by my team, people should do what people are good at and AIs should complement that. They should be created to leverage the strengths of the computational environments in which they operate.

Having said that, gaining adoption requires that users can interact with an AI in a way that seems natural to them. This has three aspects:

1. Flexibility in input and response. Humans don’t always use the same words to say the same thing. How they respond to things depends on the time of day, their emotional state, who they are talking to, and numerous other factors. An AI must also be able to respond in this way to seem intelligent enough to end users to convince them it is worth conversing with the machine. Ten variants of a specific phrase such as “How may I help you?” appear to be enough to mimic human speech variability.

2. Emotional context. At the same time, we give ABBY an emotional context. So not only will she say different phrases, her responses are also dictated by an emotion setting that ranges from happy to sad and impacts things said around the random phrase. For example, if I say “Good morning, ABBY” and her switch is set to happy, she will respond “Good morning, Arthur. Lovely day today isn’t it?” If her switch flips to sad for the day, she might instead respond “Hi Arthur. Sad to say, I’m having kind of a sucky day, but hope yours is going well.”

Another example of emotional context in play is what happens when someone says something disrespectful to ABBY, curses at her, or otherwise uses abusive language that would be offensive to any employee or is outside of Acxiom’s corporate policies or cultural norms. In this case, ABBY is trained to respond as if she were an employee. At first, she shows annoyance:

“Please do not speak to me that way. I have a very sensitive nature and do not appreciate abusive or inappropriate language.”

If the abusive language continues (which happens because users like to test ABBY’s limits), there is an escalation conversation flow with an increasing emotional content. If the abuse doesn’t stop after several steps, ABBY creates and sends an email to HR, reporting the abusive language and also notifies the user that she has made the report.

3. Cleanly handle what it doesn’t understand. We constantly remind end users that ABBY is a year-and-a-half old and that they should expect her to have the limitations of understanding. There are many phrases she will not understand, especially in the early deployment of a new intent when training is significant. Unless we set their expectations accordingly, they may anticipate she will respond like an adult with a full vocabulary. So, we remind users that her abilities are limited via a three-step response profile:

“I’m sorry I don’t understand that. Can you please rephrase that?”
“I’m sorry I still didn’t understand. Can you try one more time, please?”
“I’m only one-and-a-half years old and have a lot to learn still. Please be patient with me. I have logged this conversation for review by my team. Please come back again tomorrow and try your question. With your help, I should get better at being able to answer your question.”

Lastly, we consciously make design choices so ABBY does not appear too human. Google recently learned that making an AI that cannot be distinguished from a human is “creepy” to a large number of people. Someday, intelligent automation will be so thoroughly woven into the fabric of our daily lives that we will simply assume that most easy tasks -- like restaurant reservations -- will be handled by a machine, not a human. But until then, people want to know when they are dealing with a machine, not a person. ABBY has built-in limitations that indicate she isn’t human, such as her generally limited vocabulary; and some of her word choices deliberately sound a tad machine-like.

Be task-oriented. Users don’t want to chat with typical business AIs, except perhaps for the first 10 minutes, mainly out of curiosity for how “human-like” the AI is. Beyond that, people work through an AI to get a task done and move on. Don’t waste a lot of time on what are known as small-talk intents. People hardly use them. Focus your design on completing specific tasks as efficiently as possible. Don’t require people to memorize special codes or to type in long strings, especially for mobile. Use the shortest text possible that a human would understand.

For example, to have ABBY book a conference room on mobile, all you have to type in is “book room <city>.” This finds you a room open right now until the end of the current half hour. Why? Because when people are typing “book room” on a mobile device, they are usually running around the building looking urgently for an open room at that moment. ABBY responds clearly in natural language so they don’t have to guess about what is happening (e.g. the start time or length of booking). “I have booked Mt. Shasta for you for 25 minutes until 3:30 today. It seats 4 people and can be found on 17th Floor East.” That is an elegant, task-oriented user experience that is efficient for the user and makes ABBY seem intelligent. Of course, users can be verbose if they wish to be, and there are also short codes for those who prefer compactness, like “Book room T 2 1 SF,” which means “Book a room today at 2 p.m. for one hour in San Francisco.”

Design for ubiquity. An AI needs to be wherever I am -- a ubiquitous companion -- if for no other reason than adoption accelerates when the technology is an ambient, continuous presence. Thus, it needs to be available and work in any environment I may work in. Applications like Slack are a wonderful first environment, since end users are in Slack all day and, equally important, Slack has an app for tablets and mobile devices that people use constantly. People use browsers all day as well, so an interface within or attached to a browser can also be important. Even more ubiquitous is having the AI in the background on computing desktops, where it can provide an interface into many applications. It can be on automated check-in systems to buildings, or on tablets that sit on the wall outside conference rooms. Or sitting passively in a conferencing app like Bluejeans awaiting, for example, a request from users (spoken, in this particular use case) to open a document or search for information on google. This last example shows the power of context. Because people are already speaking, it is not interruptive to speak to the AI. Designing for ubiquity means more than having the AI in a specific environment. It also means adapting the design to be most efficacious for users in that specific context.

Role of memory in AI. A critical factor we associate with intelligence is the ability to remember what we have done in the past. A person you speak with would find it odd if you don’t remember you had a conversation with them yesterday. Similarly, AIs need to have the ability to remember past interactions with a user for several reasons. First, it is an expected function of an intelligence, artificial or otherwise. Second, it implies to the end user true recognition: “I know you because I remember all the things we’ve done together in the past.” Third, history allows for more efficient interactions for the end users. Customers want an AI to remember their prior interactions and purchases so they can easily refer back and thus save the time of having to repeat prior work. Fourth, history allows for improved predictions, more intelligent, more efficient interactions, better service quality, and improved sales. Someone who has purchased certain items in the past may be more likely to buy them again in the future and/or may be likely to buy other associated products at a later time.

Adding memory to AI is the current frontier of the technology. We will see intents with significant memory appearing in AIs sometime in the next 12 - 24 months.

Conclusion

We are a very long way from having AIs of the quality we see in movies, so don’t expect Jarvis to appear on your desktop any time soon. But can you use natural language processing today to build applications and interfaces that speed and simplify your business while increasing customer satisfaction for a reasonable cost? Absolutely, thanks to the number of open-source tools already available.

There are gains to be made using both back-end NLP technologies and front-end conversational interfaces. Each provides the business with different kinds of capabilities.

NLP and AI are only going to become an increasingly critical technology for our businesses, and companies ignore them at their peril.

Arthur Coleman is GM of Acxiom Research. He focuses on enhancing cross-channel marketing using emerging technologies such as natural language processing, AI/machine learning, blockchain, digital fingerprinting, and more. He is also actively involved in setting industry standards for consumer privacy with the IAB Tech Lab.