Here’s a stat for you: 40 percent of chatbot users only engage in one conversation.
This statistic, calculated by İlker Köksal, the cofounder of Botanalytics, suggests bot makers need to invest more effort to measure performance and deliver value to users. Traditional metrics like daily active users (DAU) and analytics tools like Google Analytics or Mixpanel work well for websites and mobile apps, but the unique conversational nature of chatbots requires a different perspective on performance.
Traditional metrics may even be misleading. Session length is often used as a proxy for user engagement on web and mobile. However, many chatbots are utilitarian and should be a functional shortcut compared to their app or website counterparts. Increased session length could mean users are confused or the conversational flow is inefficient.
With the skyrocketing popularity of chatbots, bot developers have collected enough data to learn what is and isn’t working. Bot analytics companies like Dashbot and Botanalytics have collectively pushed close to 100 million messages and get a bird’s eye view on what metrics are most useful. Developers on their platforms have tried dozens of new measurements to identify the best ways to improve their bots.
Here’s what we’ve learned are the five chatbot metrics that produce the most useful insights.
1. Active and engaged rates
Many users barely interact with a chatbot before moving on. Forty percent of a bot’s users only interact with it one time. Given the high churn, identifying and nurturing active and engaged users is key to long-term success.
Dennis Yang, cofounder of Dashbot, recommends examining active and engaged rates to combat churn. When a user reads a message in a session, that session is considered “active.” When a user responds with a message in a session, that session is considered “engaged.”
Active rate equals the number of active sessions of a user per total number of sessions of that user.
Engaged rate equals the number of engaged sessions of a user per total sessions of that user.
How do you optimize active and engaged rates? Yang suggests you answer this question: What are the top messages users send my chatbot?
The makers of Machaao, a popular Facebook Messenger chatbot for cricket fans, increased its user engagement by 300 percent by analyzing and adapting to how the most active and engaged users spoke to the bot. Users’ messages reflect their expectations around how a bot should behave. Fitting their mental models is usually a winning strategy to boosting engagement.
“We figured out that the top message sent to our bot was the Like button,” says Harshal Dhir, founder of Machaao. Inspired by watching active rates, engaged rates, and top messages, Machaao’s developers enabled easier expression of Likes and also prioritized news and schedule formats that matched the expectations of its users.
2. Confusion triggers
The nascent chatbot industry has yet to develop the optimal user experience with conversational UI. Challenges exist throughout the funnel: bringing users to a bot, communicating functionality, driving towards action, and handling inquiries and errors.
Given the huge range of possible user input, chatbots often misinterpret or can’t understand what a user wants. Thus, the incidences when your bot says a version of “I don’t understand” must be closely watched. Also useful is seeing what user inputs caused the bot’s confusion.
StreakTrivia is a daily trivia bot that runs a massively multiplayer trivia game on Facebook Messenger every day. Their bot asks players True or False questions and then presents the users with quick reply buttons to answer with. By closely watching confusion rate, StreakTrivia’s team was able to catch an issue they would have otherwise missed. Turns out when the bot was confused, this was often due to the user typing in “true” or “false” instead of using the provided buttons.
Tracking confusion rate also helps triage when human intervention is needed. Just as bad customer support associates ruin a customer’s opinion of your brand, so will a bad chatbot experience. Honing in on high-risk scenarios and escalating to trained staff dramatically reduces churn and provides a critical opportunity to learn about user needs.
3. Conversation steps
“A long conversation doesn’t necessarily mean an engaged user,” says Botanalytics’ Köksal. He points out that chatbots like Uber’s want users to order a car with as few steps as possible. If a user’s conversation with an Uber chatbot takes more than 50 back-and-forth messages, the experience is a clear failure, since the user would be better off using the app.
Köksal defines “conversation step” as a single back-and-forth exchange between a user and a bot. For example, if a user says “hi” and the bot replies with “hi” back, that’s one conversation step.
“Every chatbot [developer] needs to know their average conversation steps,” says Köksal. Utility-driven chatbots have lower average conversation steps versus entertainment-driven chatbots. Regardless of the chatbot type, conversations that either significantly exceed or fall short of the average conversation step usually indicate a bad user experience. Either a user gave up too quickly or a bot took too long to complete a user’s goal.
PennyCat, a Facebook Messenger bot that allows users to play games and find coupons, uses conversation steps to segment and redirect their users. The team easily separates users into “Game Lovers” and “Discount Lovers” because Game Lovers’ conversations typically exceed 40 conversation steps.
Once PennyCat identified the Game Lovers, they targeted them with select coupons in order to convert them to Discount Lovers. Segmenting and targeting users based on conversation steps led to a 70 percent increase in coupon use.
4. Average number of conversations per user
The number of conversations a user has with a bot is just like the number of sessions a user starts with a mobile app. The metric is highly correlated with engagement. According to Köksal, the average conversations per month range from 1.42 to 4.79 for the bots on the Botanalytics platform.
Paying attention to how the average number of conversations per user fluctuates over time gives bot developers insight into potential shortcomings and how to fix them. Developers of one recruiting bot on the Facebook Messenger platform noticed their bot’s average conversations across users had dropped. To resolve the problem, they studied users whose number of conversations were below the bot’s average and noticed that they weren’t engaging with the job offerings presented. The team changed the bot to ask users for more background information, delivered more relevant results, and saw a boost to engagement and retention.
Average number of conversations per user can also reveal if a new feature is working. StreakTrivia started out with a metric of 2.5 average number of conversations per user, but then saw a 224 percent gain to 8.1 after they implemented a “play with friends” feature.
5. Retention: 1 day, 7 day, and 30 day
Retention is not a chatbot-specific metric, but the best retention period to focus on varies based on a bot’s purpose.
For example, finding a job usually takes a minimum of 20 days of searching, so a one-day or seven-day retention metric is insufficient. When the recruiting bot I mentioned earlier noticed that retention dropped after 16 days, they fixed their problem by increasing the quality and relevance of jobs shown after day 12.
“Bots that offer repeat services like food delivery should focus on seven-day retention, whereas a content or media bot relies on daily engagement and benefits most from analyzing one-day retention,” says Köksal. “If a user doesn’t like the format of the content presented, they’re unlikely to come back the next day.”
By tracking these five chatbot metrics mentioned above, bot developers can develop nuanced awareness of problem areas in their conversational flow, segment users to provide the best user experience, and boost long term use and engagement.