How Google Goggles works to deliver visual search results for mobile phones

Google Goggles is one of the more interesting experiments being done in visual search. If you have an Android phone, you can already use the Google Goggles visual search app to take a picture of something and then submit it to Google as a search query. Google will take perhaps 6.5 seconds to send you three results that show you what the object is.

You can take pictures of a landscape, a barcode, a sign, products, text and other things. By and large, Google Goggles returns pretty good results within seconds. Over time, this technology will get better and better as both hardware and software improve, said David Petrou, staff engineer at Google Labs, in a speech today.

Petrou delivered his keynote talk about the topic of visual search at the Hot Chips chip conference at Stanford University in Palo Alto, Calif. He did so because visual search evidently requires a ton of processing power in a mobile device. It thus represents a huge computing challenge for chip designers who have to come up with better, low-power processors to be able to meet Google’s needs in the future.

Right now, the accuracy isn’t bad. Google Goggles delivers a single result — meaning it thinks it knows exactly what you want recognized — about  33 percent of the time. Google has more than 1 billion stored images that it can use to recognize the submitted queries. Google trains its search engines on those images to get better at recognizing them.

In most cases, people are doing one of two things when they submit a picture. They don’t know what the image is and want to know about it. Or they know what it is and they want to know more. Users have used the app to look up trivia, settle bets in bars, or just find out something while traveling.

“When it works, it’s magical,” said Petrou. “But we are just at the beginning and we have a long way to go.”

One of the challenges is that Google is trying to create a universal visual search tool. Wine Spectator can create an app that tells you more about the bottle of wine you are drinking (and it has). But Google has to make Goggles work for all search possibilities. Google also has to deliver specific results. If you take a picture of a chair, it can’t just tell you it is a chair. It has to tell you what kind of chair it is and who made it. Finally, it can only deliver a few results on a small cell phone screen, not thousands, so it has to prioritize its results.

Google can use the technology to recognize faces. But it chooses not to do so out of concern for privacy issues, Petrou said. But people seem to want to recognize faces. Roughly 25 percent of queries submitted to Google Goggles have faces in them. Google thus has to figure out a way to handle these searches without violating privacy.

Today, Goggle Goggles is structured so that Google receives the query and then dispatches it to a bunch of different search engines that work on it simultaneously. It can submit the result to a barcode reading engine or a text engine or a photo engine. Each engine delivers a piece of the puzzle back. Google synthesizes the results and then sends them back. Some tasks are easier to do; some are hard. Optical character recognition, or reading text inside an image, is particularly difficult and often takes a few seconds to do. On top of that, it takes time to get results because of delays in shipping large images over the cell phone network. Google also has to work hard to make sure it doesn’t deliver false results.

Petrou said he hopes that hardware designers will develop faster ways to send  and process pictures as well as improve the processing power on the phones themselves. There are other ways to speed up the results. The independent search engines can send their results back immediately upon finishing, rather than waiting for Google to synthesize a result. That can save 3 seconds from the process, but it may also be less accurate.

Over time, Google is working on improvements. It acquired a company to help improve its results and is expecting to launch an iPhone app later this year. It is also figuring out how to do a more universal program that works on HTML5 technology, a new standard that is making its way into more phones. At some point in the future, Petrou hopes Goggles can be used to improve financial transactions, process credit cards, or determine if a user is happy or not. In short, the computational problem seems to have no end in sight, and that’s what hardware designers like to hear. It means that there will be a use for the chips they’re designing far into the future.

0 comments