When we all began switching to smart mobile devices, the age-old model of software development changed. In a short space of time, we witnessed the rise of Uber, Instagram, Venmo, Snapchat, and many others — companies now acquired for, or worth, billions of dollars. The introduction of a new form of interface between humans and machines created an opportunity, and when adopted correctly, it transformed businesses and cemented tech giants such as Apple, Facebook, and Google as the biggest companies in the world.
We’ve long touted voice to have a similar impact in changing the creation and consumption of digital products and services. From cult references such as Star Trek’s “computer” to the movie Her, the possibility of using voice as an interface between humans and machines is obvious. However, as a way to interact with a product or service, voice has yet to become the foundation of the next “unicorn.” Will this change anytime soon? And if so, how can we design human-to-machine conversation authentically and ethically?
Finding a voice
The challenge for some digital products or services transferring their brand from mobile or the web might be finding their voice if they haven’t already defined it. Although most successful brands will have defined a personality, how that translates into voice, and sound, needs to be considered. However, we shouldn’t forget that one of the earliest brand experiences uses sound; radio advertising. It’s easy for us “digital types” to forget this is the case, brands have been using sound and voice since the 1920s. This has transferred across to TV and real-life experiences, we all recognize it’s a Mercedes when we hear Jon Hamm’s velvety tones, or Visa when Morgan Freeman’s distinct voice is recognizable, and hearing an American accent on a British Airways flight just wouldn’t be right – all great examples of brands owning voices or types of voices.
Using historical learnings from these media, can help brands work out what would work as their voice. Where is the brand from? How does it speak? What is its tone? What is its personality? What words does it use?
But today, people can talk to this voice and it has to respond. This is where authenticity is key.
A natural experience
In order to evolve the communication patterns between human and machine, we need to consider better what defines our own communication patterns.
Over the last century, linguists such as Noam Chomsky and Victoria Fromkin, have analyzed and posed essential questions about how we communicate, such as the role of ambiguity. There are technical limitations to the interaction model by which we can design but pre-empting and researching meaning in how we communicate can help us build better experiences using voice.
People can more easily relate to a product, service, system, or an experience when they’re able to connect with it at a personal and emotional level. So, it’s natural that understanding humans emotionally is part of the puzzle when designing effectively for voice.
Spoken voice has many advantages over written text when it comes to how humans derive additional meaning, and brands that can harness this power in a useful, non-invasive way will be those that profit most. Google’s demonstration of its new Duplex technology drew a particular reaction when it’s response to the person on the other side needing time was to reply with an incredibly human “Mm-hmm.” This non-robotic response helps change the dynamic of conversation. And in the future, being able to react to emotion will be key.
As technology evolves to further this personalization in the design of conversation, we may see the different dynamics of human conversation further work their way into our designs for voice. A prominent example in human-to-human conversation is code-switching, in which we adjust how we talk to different people. Apparently, if you work in the service industry, a Southern accent is a sure-fire way to get better tips and more sympathetic customers and as a result, many people who work in a restaurant pick up “y’all” immediately upon arriving at their job.
Therefore, we may worry about how polarizing our design based on these differences might be. Will people like our creation, given we do not talk to all humans the same way? What attracts us to some people more so than others anyway?
Creating an authentic voice
Voice technology is currently more limited than we are led to believe, tailoring every response to every person and situation is not possible yet, so designing the perfect conversation (if there is such a thing) is a bit of a way off.
But voice provides unique advantages as a familiar platform. There’s no doubt voice could take some of the routes mobile has pioneered such as using time of day or location sensitive context to create new and innovative experiences.
Voice products or services could go even further. For example, voice printing technology can recognize if the person it’s interacting with is old enough to use a credit card (or other payment method) or indeed is the owner of it. The same technology could tell if someone is feeling happy or down and react accordingly — building a new paradigm for design in the user journey to react to the emotion of the end-user.
Adding or creating APIs that draw on all these technologies and capabilities will allow a more personalized experience.
An example of an authentic voice experience is Activision’s Alexa Skill for Destiny 2, created as a character from the game: Ghost. To add authenticity to the experience, Activision used the character’s actual voice and scripting to record the responses.
The team recorded several thousand lines and dynamically stitched them together to give the experience added credibility without the gamer becoming tired of hearing repeated content. Ghost also has an API to know where the gamer is in the game at any time, enabling interactions to interact directly to the player’s character. Making it more personal and authentic.
The importance of trust
Of central importance is the perception of trustworthiness. As creators of voice products and services, we must consider the ethics of the human-machine relationship. It’s also critical we remember the power we have when we design these products and services. Human interactions by nature are not meant to be “designed” as they are a natural flow of consciousness. If we’re creating a product or service, we probably want a human to do, think, or feel something. Humans must always be aware of who they are speaking to and the relationship they develop with any voice product or service must not be exploited. The importance of this has already been underlined, where innovation meets the challenge of “creepiness” – the “uncanny valley” has been well documented from an object or robot perspective but it is set to be better tested with voice assistants much sooner.
Making it happen, today
To build a conversational interface we need to consider the factors of voice, character, temperament, and personality, all of which help to define a believable and humanistic interaction that consumers can identify with. Getting these elements right involves even closer attention to user needs and journeys.
With all of this in mind, I’ll leave you with my five tips for designing an authentic and ethical conversation:
- Start with your onboarding experience. Determine how you will guide and structure the beginning of a conversation.
- Don’t read your copy, roleplay it as your delivery will be with voice.
- Whether you’re using Alexa or Google’s existing voices or pre-recorded audio, use variation in your responses. Break responses into stems to help stitch a dynamic and more humanistic feel to your conversation.
- Think about security and privacy implications. Test to make sure your conversation would feel natural and trustworthy for users.
- List all edge cases and design the unhappy path as well as the happy path that consumers might follow. Plan responses for this too.
Nicolas Carey leads the product design team at Potato, a digital product studio based in the U.S. and the UK.