Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

We want to interact and engage with the world around us in ways that are increasingly fueled by technology.

To this end, Google today announced several AI-powered features to Voice, Lens, Assistant, Maps and Translate. 

This includes “search within a scene,” which expands on Google Voice search and Google Lens, and enables users to point at an object or use live images coupled with text to define search capabilities.

“It allows devices to understand the world in the way that we do, so we can easily find what we’re looking for,” said Nick Bell, who leads search experience products at Google. “The possibilities and capabilities of this are hugely significant.”


Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.


Register Now

For instance, Bell said, he recently bought a cactus for his home office that began withering – so he took a picture of it and at the same time searched for care instructions that helped him bring it back to life.

With another capability based on multimodal understanding, a user may be browsing a food blog and come across an image of a dish they want to try. But before they do, they want to know the ingredients and find well-rated local restaurants that offer delivery. Multimodal understanding recognizes the intricacies of the dish and combines that with stated intent by scanning millions of images, reviews and community contributions, Bell said.

This function will be available globally later this year in English and will be rolled out to additional languages over time. 

Google is similarly building out the capability for users to multi-search to instantly glean insights about multiple objects in a scene. So, for instance, at a bookstore, they can scan an entire shelf and get information on all the books, as well as recommendations and reviews. This leverages computer vision, natural language processing (NLP), knowledge from the web and on-device technologies. 

AI systems are allowing search to take “huge leaps forward,” Bell said.

“Search should not just be constrained to typing words into the search box,” he added. “We want to help people find information wherever they are, however they want to, based around what they see, hear and experience.”

No more ‘Hey Google’

Google has made it easier to initiate a conversation with its Google Assistant. With a “look and talk” feature, users no longer have to say “Hey Google” every time for the system to recognize that they are talking to it.

“A digital assistant is really only as good as its ability to understand users,” said Nino Tasca, director of Google Assistant. “And by ‘understand,’ we don’t just mean ‘understand’ the words that you’re saying, but holding conversations that feel natural and easy.”

Google has been working to parse conversational experiences, nuances and imperfections in human speech. This has involved significant investment into AI and speech, natural language understanding (NLU) and text-to-speech, or TTS. This has been bundled together into what Google has dubbed “conversational mechanics,” Tasca said. 

Analyzing AI capabilities, researchers realized they needed six different machine learning models, processing well over 100 signals – including proximity, head orientation, gaze detection, user phrasing, voice and voice match signals – just to understand that they’re speaking to Google Assistant. A new capability, Nest Hub Max, allows systems to process and recognize users to start conversations much easier, Tasca said. 

This will launch this week for Android and for iOS in coming weeks.

Another feature announced today regards quick phrases, or very popular phrases – such as “turn it up,” “answer a phone call,” or stop or snooze a timer.

“It’s just so much easier and faster to say ‘Set a timer for 10 minutes,’ than to have to say ‘Hey Google’ each and every time,” Tasca said.

More natural language enhancements to Google Assistant are based on how users talk in their everyday lives. Real conversations are full of nuances – for instance, they say “um,” or pause or make self-corrections. These types of nuanced clues can happen back and forth in under 100 or 200 milliseconds, but each person is able to understand and respond accordingly, Tasca pointed out.

“With two humans communicating, these things are natural,” Tasca said. “They don’t really get in the way of people understanding each other. We want people to be able to just talk to the Google Assistant like they would another human and understand the meaning and be able to fulfill intent.”

Natural language enhancements to Google Assistant will be available by early 2023.

Mapping the world with AI

Additional new features leveraging advances in AI and computer vision are fusing billions of images from Street View with aerial photos to provide immersive views in Google Maps. These capabilities will be rolled out in Los Angeles, London, New York, San Francisco and Tokyo by the end of the year, with more cities following, according to Miriam Daniel, vice president of Google Maps.

“Over the last few years we’ve been pushing ourselves to continuously redefine what a map can be by making new and helpful information available to our 1 billion users,” Daniel said. “AI is powering the next generation of experiences to explore the world in a whole new way.”

With new Google Maps functions, for example, a user planning a trip to London might want to determine the best sights and dining options. In doing so, they can “virtually soar” over Westminster Abbey or Big Ben and use a time slider to see how these landmarks look at different times of day. They can also glide down to the street level to explore restaurants and shops in the area, Daniel said.

“You can make informed decisions about when and where to go,” she said. “You can look inside to quickly understand the vibe of a place before you book your reservations.”

Google Maps also recently launched the capability to identify eco-friendly and fuel-efficient routes. So far, people have used this to travel 86 billion miles, and Google estimates that this has saved more than half a million metric tons of carbon emissions – the equivalent of taking 100,000 cars off the road, Daniel said. This capability is now available in the U.S. and Canada, and will be expanded to Europe later this year.

“All these experiences are supercharged by the power of AI,” Daniel said.

Meanwhile, Google Translate announced today that it has been updated to include 24 new languages, bringing its total supported languages to 133. These are spoken by more than 300 million people worldwide, according to Isaac Caswell, research scientist with Google Translate. 

He added that there are still roughly 6,000 languages that are not supported. Still, the newly supported languages represent a great step forward, he emphasized. “Because how can you communicate naturally if it’s not in the language you’re most comfortable with?”

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.