Amidst the fervor created by the Pokémon Go app, Signia Venture Partners cofounder Sunny Dhillon shared an important perspective with his July 14 story, “Stop referring to Pokémon Go as augmented reality.” Yes the popularity of this game gives us a glimpse into consumers’ hunger for AR games, but the technology to interact with the real world is just not there yet.
As Dhillon points out, true augmented reality “requires computer vision and dynamic mapping of the real world environment.” In contrast, Pokémon Go characters rely solely on Google Maps’ fixed latitude and longitude. If true augmented reality technologies were in use here, Dhillon explains, then inherent real-time depth mapping and object recognition would empower game characters to interact with the real-world, keeping them out of incongruous play areas.
True AR is still a holy grail, however, and the key to making augmented reality work will be AI-driven image recognition made possible by “unsupervised learning.” This will make it possible for devices to assess any image or video and understand it as well as we can – all in real time.
The limitations of current AI
To understand what’s meant by unsupervised learning, let’s first take a look at what’s currently in play. Leading companies like Google, IBM, and Facebook have been hard at work on AI technologies and improved image recognition, but they’ve faced limitations. Most of the current work done in this area has focused on Deep Learning techniques – the creation of artificial networks with hundreds of computational layers.
Deep Learning offers massive computational power, but contains two critical flaws. First, as systems reach 1,000 layers, they plateau in computational ability and are hamstrung with an inability to scale further.
Second, the learning process required by this model depends on hundreds of hours of human guidance – we call this “supervised learning.” Computer scientists correct wrong answers and eventually the system learns from its mistakes. It’s effective within very specific data sets – like answering questions for a quiz show, or playing a board game – but cannot be applied to environments full of constantly changing variables, such as the natural world.
And that’s a key word to consider in this conversation: natural. Deep Learning fails when applied to AR because it is an artificial system asked to understand naturally occurring environments.
Living creatures, of course, have no trouble with these tasks. To enable computers to understand an environment just as accurately, it takes processes that more closely resemble those that have evolved naturally. Stemming directly from the latest brain research, unsupervised learning is the answer here.
Unsupervised learning differs from Deep Learning in that it doesn’t require human intervention. Instead of setting upon a limited data set with the intention of getting more answers right than wrong – “this is a building, this is not a building” – a system that employs unsupervised learning will amass thousands of signatures based on similar/dissimilar factors it “sees” in each image. These include signatures for colors, shapes, negative spaces, and compounded combinations of all.
The system gains a full understanding of what a building looks like as a structure (and even what specific landmark it might be) rather than just answering whether or not that structure is or is not a building. And because a self-learning system compiles signatures – a relatively simple and repetitive task – rather than trying to process “facts,” it requires fewer computational layers to operate and can scale infinitely – just like the human brain.
What real AI-powered augmented reality could enable
Armed with a more complete understanding, AI-driven AR games will be able to fully integrate fictional characters’ actions and gameplay into natural settings. But improved gameplay is only the tip of the iceberg.
This is powerful technology, capable of transforming countless sectors. For example, consumers will soon be able to make better use of the thousands of photos taken on their phones and devices. Imagine a built-in AI assistant that can automatically organize images and videos, execute ultra-specific searches in milliseconds, and make sharing recommendations based on subject matter in images. Millions of forgotten photos will suddenly gain new life, and the potential of visual search can be unlocked.
Image recognition enabled by unsupervised learning will make driverless cars exponentially safer. Cars will spot pedestrians with complete clarity, perfectly identify road debris, and follow detours that may not yet be mapped.
In the medical field, doctors in the midst of surgical procedures will be able to get real-time information and comparisons from hundreds of related operations – guiding doctors based on challenges and solutions found all over the world.
The applications are nearly endless.
Dhillon accurately depicts AR as a movement that will only take shape when the technology can flawlessly integrate with the world. Pokémon Go is not there yet, but the technology will be available soon. The same innovations that will allow a virtual Pikachu to hide behind a real tree will be driven by unsupervised learning – a breakthrough that will fundamentally change how AI systems view the world.
Igal Raichelgauz is the cofounder and CEO of Cortica, creator of AI-driven image recognition technology.