John Underkoffler was the science advisor for the landmark 2004 film Minority Report, and he designed the gesture-controlled user interface that Tom Cruise used in the film to solve crimes in the sci-fi story.
In 2006, Underkoffler started Oblong Industries to build the next generation of computing interfaces, and in 2012, the company began selling commercial versions of the Minority Report interface. These are, famously, gesture-based systems where you can use a wand to make things happen on a big monitor.
But the interface do a lot more than that. They are spatial, networked, multi-user, multi-screen, multi-device computing environments. Architects can use them to zoom in on a drawing on the wall, allowing everybody in the room or those watching via video conference to see what’s being discussed.
I watched Oblong’s Mezzanine in action at one of the company’s clients, the architectural firm Gensler, which among other things designed the new Nvidia headquarters building in Silicon Valley. It was by far the coolest work room I’ve been in. I picked up conferencing windows and moved them around the screens in the room as if they were Lego pieces.
Oblong has sold hundreds of these systems to Fortune 500 companies, and it raised $65 million to bring these computing interfaces to the masses. I talked with Underkoffler at Gensler in San Francisco to talk about his futuristic interface, as well as the accelerating inspiration cycle of science fiction, technology, and video games. This subject is the theme of our upcoming GamesBeat Summit 2017 event this spring.
Here’s an edited transcript of our conversation.
John Underkoffler: Claire is in a room very much like this one. Three screens at the front, some displays on the side. Part of what you’re about to see is that the visual collaborative experience we’ve built, this architectural computer you’re sitting in called Mezzanine, is actually shared. There is shared control in the room. Rather than being a one-person computer, like every other computer in our lives, it’s a multi-person computer. Anyone in the room can simultaneously, democratically inject content and move it around.
The pixels are owned by everyone. These are the people’s pixels. That’s true not just for us in this room, but for all the rooms we connect to. Anything we can do, Claire can do equally. She can grab control and move things around, contribute to the hyper-visual conversation if you will. The point here is to give you a sense of what we’re doing.
I’ll grab Claire there with the spatial wand, the conceptual legacy of the work I did on Minority Report with gestural computing, and we can move through a bunch of content like this. We can use the true spatial nature of Oblong’s software to push the entire set of slides, the content, back and scroll through this way. We can grab any individual piece and move it around the room – really around the room.
VentureBeat: You guys designed the wand?
Underkoffler: Yeah, the spatial pointing wand. It’s next door to the Minority Report gloves, which we’ve also built and deployed for more domain-specific situations. The glove-based gestural work is more sophisticated, more precise in some sense, but it’s also less general. There’s a bigger vocabulary. It’s easy, in a generic computing collaboration context like this, for anyone to pick up the wand and start moving things around the room.
If you are game to type one-handed for a second, I’ll give you the wand. If you just point in the middle of that image, find the cursor there, click and hold, and now you can start swinging it around. If you push or pull you can resize the image. You can do both of those things at the same time. When you have true six degrees of freedom spatial tracking, you can do things you couldn’t do with any other UI, like simultaneously move and resize.
This truly is a collaborative computer, which means that anyone can reach in, even while you’re working, and work alongside you. If you let go for a second, there’s Claire. She’s just grabbed the whole VTC feed and she’s moving it around. Gone is the artificial digital construct that only one person is ever doing something at a time. Which would be like a bunch of folks standing around on stage while one blowhard actor is just talking. We’re replacing that with a dialogue. Dialogue can finally happen in, rather than despite, a digital context.
VB: This works in conference rooms, then?
Underkoffler: It works in any setting for which people need to come together and get some work done. The set of verticals–the Fortune 1000, Forbes Global 3000 companies that we predominantly sell to, occupy almost any vertical you can think of, whether it’s oil and gas or commercial infrastructure or architecture like Gensler. Commercial real estate. Hardcore manufacturing. Social media. Name a vertical, a human discipline, and we’re serving it.
The intent of the system itself is universal. People always need to work together. People are inherently visual creatures. If we can take work, take the stuff we care about, and deploy it in this hyper-visual space, you can get new kinds of things done.
Underkoffler: It’s how it feels to me. It should be as visual as the rest of the world. When you walk around the world, you’re not just seeing a singular rectangle a couple of feet from your face. You have the full richness and complexity of the world around you.
Even if you imagine human work spaces before the digital era—take an example like Gensler here, a commercial architecture and interior design space. Everyone knows what that style of work is. If, at the one o’clock session, we’ll work on new Nvidia building, we’ll come into a room with physical models. We walk around them and look at them from different points of view. You’ve brought some landscape design stuff. You unroll it on the table. We’re using the physical space to our advantage. It’s the memory palace idea all over again, but it’s very literal.
For the longest time – essentially for their whole history – computers and the digital experience has not subscribed to that super-powerful mode of working and thinking spatially. Mezzanine gives the world a computer that’s spatial. It lets us work in a digital form the way that we’ve always worked spatially.
Everyone knows the experience of walking up to a physical corkboard, grabbing an image, and untacking it from one place to move it over next to something else. That simple gesture, the move from one place to another, the fact that two ideas sit next to each other, contains information. It makes a new idea. We’ve just made that experience very literal for the first time in a digital context.
Although the result is, in a sense, a new technology and a new product, it’s not new for human beings, because everyone knows how to do that already. That’s an advantage for us and for our customers. Everyone knows how to use this room because everyone is already an expert at using physical space.
VB: What kind of platform is it? Is it sitting on top of Windows, or is it its own operating system?
Underkoffler: At the moment it’s a whole bunch of specialized software sitting on top of a stripped-down Linux. It runs on a relatively powerful but still commodity hardware platform, with a bit of specialized hardware for doing spatial tracking. That’s one of our unique offerings.
VB: Are the cameras more generic, or are they–
Underkoffler: Completely. Right now that’s a Cisco camera with a Cisco VTC. We’re equally at home with Polycom and other manufacturers. We come in and wrap around that infrastructure. A lot of our customers have already made a big investment in Cisco telepresence or Polycom videoconferencing. We’re not saying there’s anything wrong with that. We’re saying you need to balance the face-to-face human communication with the rest of the work – the documents and applications, the data, the stuff we care about. Although it’s nice to see people’s faces from time to time, especially at the beginning and end of a meeting, most of the time what we want is to dig in and get to the real work, the digital stuff, in whatever form that takes. From there you start injecting more and more live content, whatever that may be.
One of the experiences is browser-based. There’s a little tiny app you can download onto your Android or iOS platform, smartphone or tablet. A big part of our philosophy is that we want people to bring the tools they’re already most comfortable with as the way they interact with this experience. Anything I can do with the wand, I can also do with the browser. It’s very WYSIWYG. You drag stuff around.
If you like, you can take out your phone. The phone makes you a super-powerful contributor and user of the system as well. Anything you know how to do already in a smartphone context is recapitulated and amplified, so you’re controlling the entire room. You can grab that and move it around the space, dump it over there. You can upload content with the add button at the bottom.
That moment right there is indicative of what makes this way of working so powerful. If we were locked into a traditional PowerPoint meeting, there’d be no space, no way that anyone could inject a new idea, inject new content. Whereas here, in under three seconds, if we needed this bit of analog pixels stuck up there—you did the same thing simultaneously.
VB: So phones are the way to get a lot of analog stuff into the screens?
Underkoffler: Yeah. And we can plug additional live video feeds in. One thing that happens there is that we’re—again, we’re very excited about analog pixels. We’re not fully digital obsessives. We can do live side-by-side whiteboarding, even though we’re embedded in this more generic, more powerful digital context.
Then the pixels start to become recombinant. Let’s cut out that little bit and we can augment Claire’s idea with our crummy idea here. Then we can make one composite idea that’s both brilliant and crummy, just like that. That now becomes part of the meeting record. Everything we do is captured down here in the portfolio. Claire, on a tablet on that end, if she were inclined to correct our mistakes, could reach in and annotate on top of that.
In a way, what we’ve discovered is that the real value in computation is always localized for us humans in the pixels. Whatever else is happening behind the scenes, no matter how powerful, at the end of the day the information there is transduced through the pixels. By supercharging the pixels, by making them completely fluid and interoperable whatever the source may be – a PDF, the live feed from my laptop, the whiteboard, whatever – by making all the pixels interoperable we’ve exposed that inherent value. We make it accessible to everyone. Claire just used a tablet to annotate on top of the thing we’ve been working on.
VB: Is there some kind of recognition software that’s taking over and saying, “I recognize that’s being written on a whiteboard. Now I can turn that into pixels”?