The Amazon Echo and Echo Dot were the best-selling products across Amazon this year. We are on the cusp of an era where talking to machines will be as natural as talking to humans.
With artificial assistants there is a pressure to provide an experience; to give the customer an interaction that is as both useful and emotionally satisfying as talking to a human. A question to ask: Is there an imperative to make AI communicate like we do or to be understood? Will we start to change how we speak to sound more like the AI?
With the rise of AI and the rush to automate the connected home and emerging services, the following three things are at risk of simply being optimized out of existence.
As children, we are taught manners through repetition and refusal; we are taught the importance of saying “please” if we want something. However, some parents are noticing their children being incredibly rude when demanding something from the Echo, a service with no notion of “please.”
Most would agree that as a society we believe manners are important. So if we are to avoid a demand-and-receive generation, we need to make a stand on both how we communicate with our virtual assistants and how they reply to us. We need nanny-modes to teach our children to say “please” when asking for something, and we need our assistants to match our manners when they reply.
As AI becomes more prominent and we become used to barking orders, when we actually have to speak to a person, will we even remember to say “please” and “thank you” — and will they even mind?
Companies are starting to look to AI to power call centers, creating artificial assistants to help customers on calls. As humans we can convey much of how we feel not by the words we use but by our tone of voice. To do this effectively, there are a number of linguistic markers that a computer could be taught to recognize that, in combination, will indicate the state of mind of its human caller. Rising pitch, increase in speed, number of imperatives, and louder volume could be recognized as indicators of increasing anger and frustration.
But what about all the subtle beats that make language so rich, such as irony, sarcasm, or jokes? Would a rise in pitch at the end of a sentence indicate a joke, a question, or that you’re Australian? Trying to teach a computer to understand these quirks of language could lead to a lot of mistakes and upset customers. If the subtle complexities of language are constantly misunderstood by machines, are we then at risk of simply stopping to use them? If the Turing test was based on polite conversation, perhaps we need a revised version that can test for sarcasm too.
Every nation has its own dialects, with nuanced vocabularies and cadences local to counties, communities, and even villages. Dialect and sociolect combined with our own unique vocabularies and vocal cords create idiolects, our own personal way of speaking that is as unique to us as a fingerprint.
There are already cases of conversational assistants struggling to understand thick regional accents, forcing users to simplify their words or ways of speaking. As we interact with machines more and more, does our increasing use of natural language mean that their limitations will force us to simplify, killing our own individuality? To avoid an automaton future, we should be focusing on programming AI to understand how we speak in all of our complexity.
Currently we have to manually tell a computer all the different ways a customer may ask for something. This is laborious, inaccurate, and quickly outdated. Over time a virtual assistant could “learn” how an individual speaks, through questions, corrections, and an increasing vocabulary. If these individual learnings could be pooled from all AI assistants globally, then overall our assistants would get smarter and potentials understand how we speak first time, wye aye!
When it comes to virtual assistants, we’ve been so preoccupied with what we can do we haven’t spent enough time on the how. Our generation may find it a little uncomfortable and odd speaking to a machine, but this is the foundation level of a new technology, and we are responsible for how it teaches the next generation to interact. Language is constantly evolving, and we need to help our mechanical partners evolve alongside it.