The LDV Vision Summit took place in New York a couple of weeks ago, focusing on the high-potential area of computer vision. It covered everything from 3D imaging and VR to deep learning and Facebook Live (I even gave a presentation on augmented reality ads), and I came away convinced that this is an area that all of us — creatives, engineers, marketers, and investors — should be keeping an eye on. Here are 5 reasons why:
Just like there’s an Internet of Things, we’re going to have an Internet of Eyes
Connected cameras and visual sensors are increasingly everywhere, and in everything, as LDV Summit founder Evan Nisselson laid out in his introductory remarks at the LDV Vision Summit a couple of weeks ago. The combination of ubiquitous visual recording and connectivity to real-time big data is already resulting in smarter buildings that can adjust energy settings based on people’s movements. It will also lead to smarter homes, where, say, your mirror will be able to tell you if you’ve gained weight or how much alcohol you drank thanks to built-in 3D scanners and gas sensors, respectively.
And, of course, once this happens, contextual “in-mirror” ads will let you buy low-calorie food and aspirin right from your bathroom and get it delivered in minutes.
Computer vision won’t just see what we see
Those of us in this field know that image recognition – the process by which computers break down images into pixels and recognize patterns in order to “see” what’s in them – isn’t the only thing that goes into computer vision. Computer vision relies on real-time access to big data, geo-location, sensors, ultrasound, and other types of vision, such as thermal imaging, which lets a computer “see” things that humans can’t see: gas and heat, for example.
Computer vision’s promise goes way beyond processing vast amounts of images to quickly identify what humans can already see: It’s really about what humans can’t see. From augmented memory (getting a name and other relevant information about a person as soon as you lay eyes on them through next-gen Google Glass type devices) to “seeing” gas leaks, computer vision will enable superhuman perception from multiple channels in real time.
We’re not there yet. As Serge Belongie, Professor of Computer Vision at Cornell Tech, put it, “If you look at the state of the art in computer vision, we are not telling people something they don’t know about an image. We’re telling people there’s a banana and a bicycle in a picture.” Yes, that’s progress, but there’s so much more that computer vision can and will extract from a photo in the future.
If we can’t get enough training data, we’ll stall
The advances and capabilities of image recognition that are now possible thanks to deep learning neural networks and cheaper and faster computing power have changed the game, but these brilliant algorithms are nothing unless they have access to reams and reams of picture data to train on. From medical images to faces to hugs, the vast majority of pictures that could be used for training data is proprietary and in the hands of two behemoths: Google and Facebook. Considering the amount of sharing and uploading that’s taking place on those companies’ consumer platforms, this trend shows no signs of slowing down. As Greylock Partners’ Josh Elman reminded us during an interview onstage, it is a potential roadblock for computer vision if two big companies hold all the training data.
VR and AR need computer vision
Some believe virtual reality (VR) and augmented reality (AR) are over-hyped at the moment and that computer vision efforts being put into these technologies are a waste of energy. However, most people I’ve spoken with believe both VR and AR are here for the long term and that, in order to advance, they will need high-quality computer vision capabilities (such as using image recognition to improve “interactive” forms of VR).
Many people likened where VR is today with where the Internet was in the mid-’90s: perhaps not yet a mainstream and monetizable phenomenon, but one that will become widely adopted as the norm. The more computer vision advances, the more its capabilities will draw people to VR and AR, making monetization much more practical. After all, how good would a step-by-step AR instruction manual for an Ikea bookshelf be without accurate computer vision to tell you that you’re using the wrong screw in the wrong place?
Computer vision is already increasing human safety
Besides deep learning and AI, the other big tech trend these days is robotics. And, yes, most robots need (and will need) computer vision to do everything from butlering to assembly line inspection – Nanotronics, for example, uses image recognition to look for imperfections in computer memory wafers in a much faster and more precise way than humans can.
But even for small businesses and consumers, robotic help is on the way in the form of Carbon Robotics’ Katia, a sub-$5,000 robotic arm that uses computer vision to make sure it doesn’t kill people (robotic arm accidents are no laughing matter). According to Carbon Robotics, $5,000 or less is the threshold for making robotic arms mainstream, which is probably why Carbon Robotics won the Day 2 business challenge at LDV.
The LDV Vision Summit confirmed that these are still early days for computer vision but that current innovations have a bright future, both on a business level and on a broader, societal level. It also confirmed that there’s more opportunity than ever for young engineers to be part of an exciting, game-changing industry.
Ken Weiner is CTO at in-image ad platform GumGum.
The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here