That means a Glass wearer can look at something and CamFind will identify it within a matter of seconds. If it works, it could very well help Glass live up to its original promise of making life easier for its wearers.
What’s interesting about this app is that Image Searcher has beaten Google — which has touted its Google Goggles visual search app in the past — to a visual search app on the Glass platform.
Google Glass is a wearable device that you wear like glasses and use to access computing or camera functions while on the run. CamFind can bring visual search to Glass, using artificial intelligence to recognize images. It combines that with crowdsourcing, where humans can correct the identification of images.
CamFind can produce an accurate answer for an image query within 12 seconds, Dominik Mazur, chief executive of Los Angeles-based Image Searcher told VentureBeat in an exclusive interview. If the computer vision gets it the first time, it can take milliseconds.
“It really takes the friction out of the process,” said Brad Folkens, chief technology officer, in an interview. “When we started this, we found that mobile search was broken. We make it so you can look at something and then get an answer. When you can do it fast, it turns from a novelty into a utility.”
12 seconds may seem like a long time for an answer in an age of instant gratification. But it’s a lot faster than any alternatives you might have while on the run, said Folkens. It can do things like identify brands of clothing. If you like someone else’s shoes, you can use the CamFind app to find out what they are. In the future, once e-commerce comes to Glass, you’ll be able to purchase what you see at “the moment of inspiration,” said Mazur.
CamFind has already successfully returned answers for about 17 million searchers. That’s because Image Searcher’s earlier apps, TapTapSee, a virtual assistant product for the blind, and CamFind for the iPhone and Android, have been on the market for a while. Image Searcher has refined its algorithms with those apps, and it is now bringing the solution to Glass. With TapTapSee, a blind person could point a smartphone at something and the app would speak a description of the object in front of it.
A Glass wearer can initiate the CamFind app by speaking a verbal command, “OK Glass, what do you see?” Glass will take a picture at that moment and CamFind will search on its servers for a matching image. It can then return search results to the wearer, who sees the results as text in the Glass eyepiece. CamFind produces tags or keywords of what it believes the image contains. If the image isn’t clear, then the CamFind will seek a human expert to give the answer. Those humans are currently staff people employed by Image Searcher.
“Whenever we don’t have good confidence in the answer, we send it to the crowd,” Folkens said. “We use the humans to teach the computer how to answer the queries better. As time goes on, and we get more queries, we get better at answering the questions. We get much closer to 100 percent accuracy.”
That part of the solution requires the company to add people. While Image Searcher has 12 employees, it also has 64 contractors in the U.S. and 87 overseas.
“We are able to balance speed, cost, and accuracy,” Mazur said.
Folkens added, “We wanted to create a computer vision technology that everybody could use.”
“I believe the CamFind solution is a very innovative technology system,” said Eugenio Culurciello, founder of TeraDeep and an expert in topics like image tagging and computer vision. “While most companies have focused on computer-based content tagging of images, CamFind has user human annotators to create one of the largest image dataset validated by humans, with more than 20 times the amount of images of the largest datasets. This gives CamFind the ability to compete with internet giants such as Facebook and Google, in the amount of data they have collected and the quality of the data.”
Culurciello, who is also a professor at Purdue University, added,”CamFind is now on Google glasses, and it is going to revolutionize content tagging, as there are many situations in our daily life where we look as something and ask ourselves, ‘What is that?’. Now CamFind on Glass can give us an answer without having to pull out a phone and do a few complex moves. I think this is finally the future coming true. Also the Google app is making it simpler to collect more data with less effort, bringing the value of CamFind higher that it ever was.”
Image Searcher has made its image recognition technology available through an applications programming interface, so outside developers and companies can use it to improve their own visual search results. So far, 400 developers have downloaded the API over the past year and a half.
Google had previously created a smartphone app dubbed Google Goggles, which let you take a picture of a wine bottle and get useful information about it. But it didn’t work most of the time, and Google took down the Goggles app.
Meanwhile, CamFind has been downloaded more than 1.6 million times. As the downloads grow, the processing problem may be tough. But Mazur said the company believes its infrastructure partners will be able to handle the load.
The results are pretty amazing for a startup created in 2012. Mazur and Folkens were both computer science and mathematics majors, and they have worked together for the past 12 years. They previously ran a company called Net Ideas, which created the university directory StateUniversity.com.
Image Searcher’s investors include 51 angels, including Kamran Pourzanjani (founded and sold Pricegrabber.com for $485 million), David Perry (who founded and sold Gaikai for $380 million), and an unnamed multibillionaire. Image Searcher is actively looking to raise a venture capital round. To date, it has raised $3.7 million in seed funding and a convertible note.
It’s not yet clear when the Glass app will be available. Image Searcher has submitted its app for approval.
“It’s finished, and we’re spreading the word about it,” Mazur said.
Asked what kind of competition Google itself will be in the future, Mazur said, “That’s a really good question. Google might in the future put a search tool on Glass. But current state-of-the-art computer vision alone doesn’t come close to the accuracy that we have today. Maybe they can five to 10 years in the future. With our approach, we can deliver functional visual search today.”
Here’s the video.
The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here