Google Assistant is about to get a lot smarter in how it speaks to you, understands you, and sees the world. Google engineer Behshad Behzadi gave us a peek at some of the assistant’s improved natural language understanding and computer vision at Google Developer Days in Krakow, Poland this week.
Behzadi called his demo a “mixture of things which are live and launched” and said each new feature would become available “within the next few months or the next year.” Here are some of the features that do not yet appear to be live or widely available.
Google Lens computer vision
We’ve known Google Assistant would work with Lens computer vision since it was first announced at the company’s I/O developer conference in June.
Google Lens can identify objects, text, and buildings when you point your camera at them. With Lens, Google Assistant gets the ability to see and can then talk to you about what it identifies via your camera.
In one example showcased at I/O, Lens was pointed at a theater and the assistant then searched online for tickets to shows at that theater. Behzadi showed Lens doing even more when the camera was pointed at different objects, including providing the number of calories in an apple and converting paper money to a different currency.
To showcase this feature, Behzadi told Google Assistant to “be my Vietnamese translator” and the assistant performed real-time translation, both with text on the phone and through the Assistant’s voice. No details were provided about the number of languages that would be available for on-the-spot-translation.
This feature is surfacing shortly before Apple’s expected announcement of the launch of iOS 11, which will include real-time translation with Siri.
Better contextual understanding
Google Assistant is also learning how to focus on the intent of your first question and then continue to answer follow-up questions about the initial topic. Once available, this feature will mean users can go deeper into understanding a topic without needing to restate their intent after every question.
So you will be able to say “Where is the Empire State building?” and then follow that up with “I want to see pictures” or “Who built it?” to get results for the Empire State building. Say: “What are the Italian restaurants around there?” and Google Assistant will serve up listings near the Empire State building.
Following the same logic with image searches, saying “Show me Thomas” will bring up a picture of the most popular result, Thomas the Tank Engine, but if you say “Bayern Munchen team roster” and then say “Show me Thomas,” you’ll get photos of player Thomas Müller.
The ability to answer follow-up questions was first spotted in Amazon’s Alexa in late 2016.
Vague, longer queries will also be better understood with improved natural language understanding.
Say: “What is the name of the movie where Tom Cruise acts in it and he plays pool and while he plays pool he dances?” and Google Assistant will respond with the name of the movie (it’s The Color of Money), a summary, and the cast.
“This is possible by merging the power of search — the signals coming from Google search — with machine learning,” Behzadi said.
You can already tell Google Assistant to remember the name of your favorite sports team. In the future, you will be able to ask “How is my team doing?” to get current stats. And in upcoming updates, you will be able to teach Google Assistant more about your preferences. On stage, Behzadi told Google Assistant “When the weather is more than 25 degrees (Celsius) I can swim in the lake of Zurich,” to which the assistant replied “OK, understood.” In the next question, he asked “Can I go swim in the lake of Zurich this weekend?” and Google replied “No, you can’t. The temperature is less than 25 degrees.”
Better understanding in loud environments
No specific details about improved speech recognition were provided, but Behzadi said Google Assistant is getting better at understanding voices in loud environments.
“We actually have spent lots of time on trying to improve the speech recognition in noisy environments, added lots of data to the machine learning systems behind automatically generated noise — like fake noise of a stadium or people or cars — and that’s actually how we’ve managed to significantly improve this,” he said.