VB: I like the way VR audio works, how audio draws you to a particular point of view.
Voltolina: Definitely. We believe that’s half of the feeling of being present, that it’s driven by audio. If you just use stereo audio, or a mix that’s not accurate, it’s not the same.
VB: I do wonder where the technology is going to settle. Some of the early cameras had something like 36 modules. Why not use that many? Is that necessarily better?
Voltolina: The compromise is always the amount of data versus the benefit you get from it. We have eight cameras with a lot of overlap to create two layers of pixels. But we stop at a certain amount of data, because we wanted the live monitoring. We want a work flow that’s doable in real time. It’s a bit like those cameras that can capture a really high number of megapixels, but then you need to transfer the data in a certain way before you can see the picture.
Also, it’s the number of seams. If you increase the number of cameras, it’s true that you’re capturing at a much higher resolution, but then the number of seams you have to stitch is much higher. If you look at the wall over there, at that panel with the seams, if I just had one big piece of glass it would look better. But of course the cost is a tradeoff, having a big piece of glass that’s hard to transport compared to a bunch of little ones. More seams adds much more heavy computation. And if the seam cuts through an interesting part of the video, my brain will pick up on it right away. I’ll look right at it, and after that I can’t ignore it.
We compromise with our number of cameras because we want some flexibility in where the seams go. If you have a very high overlap, I can choose to position the seam at different degrees. I can dynamically change it so that if a person’s face is right on the seam, I can shift it to the left or right to avoid that problem. But if there are too many seams, as soon as I shift it to the right it’ll impact the other ones nearby. This gives you more space for that flexibility.
VB: What about the difference between applications? You have Hollywood cinematographers, consumers, GoPro enthusiasts. It seems like different cameras will be useful for different people.
Voltolina: If you start from the top of the market, that’s where money and time is available right now. If I’m at the top, I can shoot and perfect my product very well. But it also means the data I’m capturing has to be the best data possible. I can extract every single drop out of that data because I have time and money.
As soon as you become more limited in your budget — by which I mean both time and money — then the number of people operating in a smaller crew — it’s not that they don’t follow the same steps, but each person winds up with two or three roles. In a big production you have a cameraman, a lighting person, a sound person, assistants, and so on. A smaller team, maybe five people, one person will be the director and DP, another will do sound and light. That means that the product has to be able to do more things for more roles, all integrated.
Then you go all the way down to the one-man band. A news freelancer out there in a war zone, a guy who films weddings, or a guy who does educational videos for corporations. These might be $5,000 to $15,000 productions. Certainly not millions of dollars. But they need to work fast, because they need to make that money in a week, not over six months. The setup time becomes very important. Fast stitching and turnaround is very important, because they need to be able to show it to the customer, get an approval, and get paid.
Ozo, right now, is reaching the independent production stage. But for the one-man kind of production — it’s practically usable by one person, but the price is still significant. If I do weddings, I might rent an Ozo for one job. But most likely I wouldn’t own one yet. At some point I might line up enough jobs to make the investment and have that as a differentiator for my customers. Even if you’re not shopping for a VR experience, you might go to the guy who also does VR experiences because that shows that he’s the most technically advanced. It becomes a marketing hook for professionals.
The market for 2D 360 video and VR experiences is rapidly expanding. But they’re still nowhere close to regular video. Like I say, we’re still in the brick phone era.
VB: It sounds like a fun territory to be in right now.
Voltolina: It’s very interesting, yes. The part that’s most intriguing to me is this area where we can watch the same video and have a different experience. I can share something with you that, even if we both watched the video, you haven’t seen. A third person could come in with something we both haven’t seen. From a social exchange point of view, it’s fantastic. If we’ve all seen it once, that doesn’t mean we’ve any of us seen the same thing. Maybe I’ll try to watch the way you did a second time. It becomes a very interesting mechanism.
VB: Do you have any news coming up at CES?
Voltolina: We’ll keep you posted. [laughs] We’ll have some news. In general, we’ll have ongoing updates all year long, because we’re working on so many different fronts. We have the camera, the Ozo Live software, the Ozo Player, other technology to enable better viewing and stitching and live streaming. The last announcement on Ozo Live, for example, added support for multiple cameras. That’s a huge step.
VB: How big a part of Nokia is this? How many people are working on it?
Voltolina: A few hundred. Nokia Technologies overall is 800, 900 people. That includes digital health, digital media, and the licensing team. But we’re definitely expanding. If you visit our website, we’re hiring talent.
VB: Where is most of the work done? Is it Finland?
Voltolina: The majority of the R&D is in Finland. That’s where the project started. Now it’s maybe 65 percent Finland, 35 percent Sunnyvale. Sunnyvale is expanding. But it’s so competitive here. There’s a lot of VR expertise and a lot of VR investment. The expertise becomes a scarce resource. It’s like any wave of technology in Silicon Valley. As soon as everyone identifies a new wave, the highest concentration of investment is here and there’s a fight over the rock stars.
VB: What about augmented reality? Are you looking into that?
Voltolina: Definitely. AR is another area, though. AR has two meanings now. One is AR on your real surroundings, but there’s also AR video capture. As you can imagine, I can capture a video of a certain area and do AR not necessarily on what’s around me, but what’s been captured. That area is extremely interesting. It’s not just like subtitles or overlays, additional data that’s embedded in a video and it’s the same every time you watch it. With AR you can do it dynamically.
Again, you watch the same video, but depending on how you look at it or how you control it, different information can be overlaid. You can do that in a more interactive way. Every time you watch it you discover something new, extract different information. Up to a point of augmenting something that wasn’t there. What if something else was happening? But in an interactive way. What if I watched a recording of a meeting, but with different people there? Or the room was different somehow. All kinds of things.
You can see a convergence happening. There’s computer-generated VR and recorded VR. But you can easily imagine mixing those two together, in particular when the playback platform is the same.