The Turing test holds no value in assessing conversational AI

AI is becoming the new user interface. From self-driving cars and Amazon's Alexa to robo-advisors and facial recognition locks, consumers are interacting with AI like never before. And this is just the beginning.

For years, AI enthusiasts have used the Turing test as a guide for developing conversational bots. Developed in 1950, the Turing test focuses on believability, analyzing a machine's ability to behave indistinguishably from a human; researchers have long considered passing the test as the holy grail of AI. This benchmark, though, was created in an era when AI wasn't common, and teams created machines with the goal of creating a human clone.

Over the past few decades, Hollywood's portrayal of AI in movies like Her and I, Robot also sought to replicate human characteristics. Tinseltown's version goes way beyond what today's commercial tech can achieve, but we still seem to measure modern applications of AI against these fictitious interpretations.

Solve problems, don't just ape humans

Today, we're somewhere between the Turing test and Hollywood's in-your-face robots. AI is surpassing human capacity in subtle but powerful ways like diagnosing diseases. It's the technology powering some of the most advanced applications in the consumer tech market, and we're just on the cusp of implementation.

In the application of modern AI, the number one goal is to solve problems. Reproducing human characteristics is only one ingredient in a complex concoction of an effective AI, and many human characteristics are even counterproductive. Yet we still see engineers building things like time delays in conversational AI responses to make it appear as though a bot is "thinking" and similar tactics to contort technology into passing the Turing test.

When aeronautical engineers designed the 747, they tested whether it could cross the Atlantic -- they didn't try to build a mechanical pigeon. Similarly, self-driving cars learn in a unique way and behave much differently than cars with a human behind the wheel. Why should AI have to hew to the human model?

With conversational AI's growing prominence, it is critical to have a universal, realistic understanding of what we consider to be a success and what we deem fails to meet today's standards. AI will make a different set of mistakes than humans do and will also learn from these mistakes differently. This means we need to measure success for machines differently than we do for humans.

New success metrics for AI

So how do we update the Turing test for practical applications of conversational AI? We need to get away from how "advanced" it feels and focus on the primary goal: efficiency. We should regard AI as providing a significantly better alternative to how we solve problems today. As we move forward, we also need to widen the scope to encompass all intelligent behavior that could be useful to the end user. Here are several KPIs researchers could use to more accurately measure the success of AI.

How it applies context: AIs should not work in a vacuum, but be situationally aware. Conversational AIs have increasing access to various contextual triggers that should tailor the experience, and they are in a unique position to leverage this data in ways that would not necessarily work with human agents. For instance, as a consumer, if a human customer service rep knew exactly where I was, I might feel creeped out. With an AI, I might think it's cool, especially if it's giving me something with immediate relevance.
How it learns over time: An AI should learn from every interaction. For example, a researcher might consider if a bot provided the right information based on a person's response and tone. They also might want to look more closely at the user questions that the machine is unable to answer. The sign of a good AI is not top performance on day one, but an upward-trending curve.
How comprehensive and connected it is: Most "great" conversational AIs so far are really good at one single thing, which is not practical for the long term. AIs need to connect with various systems to span the entire customer journey, enabling the person to complete everything in a single place. A retailer AI, for instance, needs to personalize product recommendations, manage a CRM, conduct orders, provide status updates, and manage customer support.
How well it holds memory: A person should never have to reintroduce themselves. Conversational AIs need to have short-term and long-term memory, keeping and acting on what an individual has liked in the past. When you call customer service, send an email, or walk into a store today, you're a stranger; your preferences, past purchases, and social comments on a brand all are unknown. A compelling AI will act on the data seeds a person has sprinkled over the entire conversation.
How it predicts needs: AIs need to tap into predictive algorithms that can anticipate what a consumer might need based on their historical context and current situation. The AI should analyze aggregate data to identify the best course of action from what has resulted in the most positive sentiment in similar circumstances.
How flexible it is: AIs need to be where the consumer is. Good AIs can't be available only on chat, or websites, or voice calls. What will distinguish successful AIs from the rest will be cross-platform performance and the ability to hold the same knowledge base at every touchpoint.

AI is not human. And humans are not AI. There are always going to be things that a human will do better -- having empathy and solving complex first-time issues are a couple of good examples. Only when AIs exhibit the ability to solve problems more quickly and intelligently than humans can we start flying over oceans, sidestepping the blueprint of the mechanical pigeon.

Puneet Mehta is the founder and CEO of msg.ai, a conversational AI platform for marketing, commerce, and customer service.

Solve problems, don't just ape humans

New success metrics for AI

More