In early June, Apple introduced its first attempt to enter AR/VR space with ARKit. What makes ARKit stand out for Apple is a technology called SLAM (Simultaneous Localization And Mapping). Every tech giant — especially Apple, Google, and Facebook — is investing heavily in SLAM technology and whichever takes best advantage of SLAM tech will likely end up on top.
SLAM is a technology used in computer vision technologies which gets the visual data from the physical world in shape of points to make an understanding for the machine. SLAM makes it possible for machines to have an eye and understand what’s around them through visual input. What the machine sees with SLAM technology from a simple scene looks like the photo above, for example.
Using these points machines can have an understanding of their surroundings. Using this data also helps AR developers like myself to create much more interactive and realistic experiences. This understanding can be used in different scenarios like robotics, self-driving cars, AI and of course augmented reality.
The simplest form of understanding from this technology is recognizing walls and barriers and also floors. Right now most AR SLAM technologies like ARKit only use floor recognition and position tracking to place AR objects around you, so they don’t actually know what’s going on in your environment to correctly react to it. More advanced SLAM technologies like Google Tango, can create a mesh of our environment so not only the machine can tell you where the floor is, but it can also identify walls and objects in your environment allowing everything around you to be an element to interact with.
Before SLAM there was Marker-Based AR
A few years back, Apple acquired the leading German AR company, Metaio. Metaio was innovative and had substantial lead in the AR market those years. What we see with ARKit is an improved version of Metaio’s SLAM. Yes, even before Apple introduced ARKit some companies had SLAM in both Android and iOS, like Wikitude and Kudan. But what Apple introduced was far better than today’s SLAM technologies.
Before this, most AR experiences were marker-based, meaning you needed a defined image to point your device’s camera at to see the AR experience. Using the defined image allowed your device to understand and track the overlaid digital content. The problem with marker-based technology was that users had to have a physical object (the image) to experience it. So companies had to promote both the application and the physical object (catalogues, brochures etc).
But with ARKit this is now solved and you don’t need anything except your phone and your environment. But there’s one important thing here lacking: context!
Recognizing floors alone is not enough
Marker-based technology was limited but it had context, meaning it had an understanding of the physical world (through the defined image) and could change the experience based on that. For example you could point your device’s camera at a McDonald’s package and experience McDonald’s augmented reality or point your device’s camera in the same app at a Starbucks cup and experience a totally different augmented reality content. These central apps are called AR browsers and will have a critical role in the future of AR.
So though ARKit has a great technology it lacks context and its apps won’t have an understanding of where users use them. Developers can use inputs like GPS data or environment light to add more context but it doesn’t have context in its core. Last week developers did an interesting job in making a demo of using ARKit for navigation, but it’s important to note that such demos use GPS data as input and they can’t recognize locations via visual input and thus are not near what Google Tango can do with its indoor navigation technology.
ARKit + CoreLocation, part 2 pic.twitter.com/AyQiFyzlj3
— Andrew Hart (@AndrewProjDent) July 21, 2017
There is no doubt that the future of AR is SLAM technology, but for it to be really useful, and not just for fun like Snapchat filters or landing a SpaceX shuttle in your pool, it will need context. Other major companies like Google are aware of this.
Google is not in a hurry
Google is doing SLAM with its Project Tango, developed in partnership with companies like Lenovo. Tango uses two cameras to sense depth and have an understanding of the world via SLAM maps. Unlike Apple’s ARKit, Project Tango has context in its heart so it can have applications like “Indoor Navigation” as it is much more advanced than ARKit and has an understanding of its surroundings via SLAM maps. SLAM maps are databases of machine’s visual understandings of the world and their importance is that they enable machines to interact with physical world and also differentiate between places.
Despite recent analyses Google is actually much more ahead in the AR game. Context is the most important part and though Google’s project tango is unlikely to take off in the future as it needs special hardware (two cameras to sense depth that only a few devices right now support).
Google has already has context and has visual understanding of the world via Google Lens. This data will be much more valuable as in people switch from mobile devices to wearables like AR glasses.
Facebook is trying to catch up
Google’s main competitor in the field of augmented reality is not Apple, but Facebook. Facebook has the advantage of a 2 billion user community, and once it develops a way to let its community to handle the mapping it will have great leverage. Unlike Apple, Facebook’s AR vision is all in its own apps and doesn’t let users to use the technology inside their own apps.
Analysts say that Apple allowing users to have AR technology inside their own apps gives it an advantage over Facebook. But as the fight over the physical world’s visual maps heats up Facebook having all their info within their more tightly walled garden may give it an advantage.
Snap is also a company with the advantage of a large community that is aware of this opportunity. In a recent patent Snap introduced a technique of combining GPS data and SLAM maps to put related AR content in the real world. Meanwhile, Lenovo is also trying to make a SLAM database in partnership with Wikitude called Augmented Human Cloud.
One database to rule them all
The company with the most complete SLAM database will likely be the winner. This database will allow these giants to have an eye on the world metaphorically, so, for example Facebook can tag and know the location of your photo by just analyzing the image or Google can place ads and virtual billboards around you by analyzing the camera feed from your smart glasses. Your self-driving car can navigate itself with nothing more than visual data.
Every tech-giant knows the importance of having this database but they all have their own advantage and disadvantages in this field.
- Apple has the least fragmented platform in smartphone market and can easily give its users the power to experience AR in their smartphones like it did with ARKit. But it lacks the community power.
- Facebook has the power of its community but it lacks the platform to have full control.
- Google has the power of platform with Android and also has products like Google Lens and Tango, but it lacks the power of community and its project Tango is unlikely to take off because it needs special hardware and few companies are supporting its vision and approach.
Having a visual understanding of the physical world is something that tech giants will fight over in the next years and companies like Apple that have lost out in areas like maps will be more careful. AR is predicted to be a billion dollar market and it may be the next big thing — and none of the tech giants want to be left behind.
Register for GamesBeat's upcoming event: Driving Game Growth & Into the Metaverse