At the Privacy Week conference in Vienna I gave a talk called “Privacy and Data Security of Chatbots.” I wanted to point out that you should consider what data to share with a bot and in which steps the data is processed when talking to a bot: How secure are the messenger apps themselves, the connection the data is transferred over, the natural language processing (NLP) and machine learning tools used, and, last but not least, the backend of the bot itself and its database?
A chatbot is a service that enables you to interact with a service or company through a conversational interface. It’s a computer program within a computer program that can have an intelligent conversation with one or more human users. Chatbots are also referred to as virtual assistants or conversational commerce.
When Facebook announced at their F8 conference in mid April that they were going to open up their Messenger platform to bots, I was eager to try their API. So I created one of the very first chatbots on Facebook — and definitely Austria’s first Facebook Messenger and Skype chatbot — Mica, the Hipster Cat Bot.
If you’re interested in learning more about the basics of chatbots, I’ve already written some articles, such as Mica, the Hipster Cat Bot — Four Month After The Launch and Why emoji fit perfectly for chatbots.
However, messengers are widely used, and the success of bots pose a question: What about data security and privacy of messenger apps and their chatbots?
EFF’s Secure Messaging Scorecard
We are typically sharing very personal data when talking over messenger apps to each other. Messaging is a private and intimate thing, and messenger app providers are expected to keep their user’s data private.
Also, we assume the conversation between a user and the chatbot owner are not shared publicly without the user’s explicit consent, but what about the security of the platforms?
In the face of widespread internet surveillance, we need a secure and practical means of talking to each other from our phones and computers. The Electronic Frontier Foundation created the Security Messaging Scorecard to measure and communicate how secure “secure messaging” products really are.
Version 1.0 of the scorecard evaluated apps and tools based on seven specific criteria, ranging from whether messages were encrypted in transit to whether the code had been recently audited. Though all of those criteria are necessary for a tool to be secure, they can’t guarantee it; security is difficult, and some aspects of it are hard to measure.
This scorecard from November 2014 shows a security score for different platforms. Here you’ll see an extract of the analysis for different messenger apps:
As you can see on this scorecard, most messenger programs encrypt the message during transit, but some messengers, such as Kik or Skype, haven’t even been audited recently.
Some messengers open up their source code to independent reviews. Most of the messengers analyzed by EFF have no way to verify the identity of the contact (only Signal and WhatsApp provide this feature).
Some messenger apps are end-to-end encrypted, such as WhatsApp and Signal, meaning that the platform’s server is not reading the conversation. Some of these messengers provide an API for bots, such as Telegram, Skype, Facebook, and Kik. Usually with bots, the platform provider as well as the bot provider see the conversation unencrypted and hence have complete access to it.
The only messenger that would receive an A grade from EFF is Signal, and widely used apps such as Skype (300 MAU) and Kik would get very bad grades.
In 2016 Viber also added end-to-end encryption to their service, but only for one-to-one and group conversations in which all participants are using the latest Viber version. Similar criticism comes with Allo, the new AI-based messaging app from Google, having the end-to-end encryption turned off by default. Security by default would be ideal, but NLP would not work that way.
Meanwhile the competition for the next main platform for chatbots has started: Facebook, Skype, Kik, and others are racing to be the major ecosystem for bots. Every bot platform tries to offer easy integration of bots and a great user experience. The paradox is that in messenger apps, the majority of conversations are private and personal between two people, and bots are now entering this domain.
Your personal data
Bots now enter the domain of personal and private communications. And we see a transfer of control over data from the user to the messenger app provider.
The same is already happening in China with WeChat and QQ, where people integrate the messenger app far more in their personal life through micro-payments to friends, or paying their bills or rent in WeChat.
WeChat Pay offers a lot of different services and became a single medium for all transactions, and Messenger wants to become this for the West.
Cloud-based AI tools
Personal data is worth a lot to Facebook or Google, and messenger platforms were not created initially with a focus on privacy.
Chatbots can analyze data with external tools for NLP and intent understanding. Usually the data is not encrypted when sent to tools such as wit.ai, api.ai, or IBM Watson, although it might be sent via HTTPS. These cloud-based APIs process users’ input for intelligent analysis and could analyze everything you write, critical especially when handling sensitive data such as financial account information or passwords.
What do bots know?
Usually bots don’t know much about their users initially. Typically it is something like the name and a screen name and maybe a little additional data.
And this is only what the bot derives through the APIs of the messenger platform. Think of all the data you send to the bot. It is super easy to create character studies based on the text you send to a program.
For instance, sentiment analysis computationally identifies and categorizes the mood expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral.
The same concerns apply to other bot-related tools, such as speech-to-text converters, image recognition apps, linguistic analysis tools, and others.
However, consider also who else might be listening to your conversation — and I don’t only mean the bot developer or project managers. Currently the conversation between bot and user slips through the server of the messenger app, so Facebook or Google also listens to everything you say to a bot!
Bots usually also store contextual data such as a geo location or a state (which data is needed for which step when communicating with a bot?). This could also be a telephone number or other private data, and no one knows whether the data is encrypted before it gets saved to a database.
Emotional reactions to conversations with bots
People, especially teenagers or seniors, tend to text with bots more. Studies show that seniors tend to chat with Siri when they are lonely; the same happens with bots that are capable of conversation.
Users also tend to text with bots like no one is listening. When Joseph Weizenbaum was building ELIZA, he realized that one tester felt ashamed when he entered the room, saying, “Sorry, but I’m currently talking to ELIZA!”
Another interesting aspect is that people react emotionally to bots. They love them and tell the bot this, or they hate them and start using foul language. Based on this data, you can create personality profiles of bot users. So be careful what you write to a bot and what data you expose on platforms.