Join gaming leaders online at GamesBeat Summit Next this upcoming November 9-10. Learn more about what comes next.
Rendering 3D graphics for the latest high-resolution displays has never been an easy task, and the challenge level increases multiple times for VR headsets with twin displays using high refresh rates — something Oculus’ parent company Facebook knows all too well. Today, Facebook researchers revealed a new technique for upsampling real-time-rendered 3D content, using machine learning to instantly transform low-resolution, computationally easier imagery into a very close approximation of much higher-resolution reference materials.
The easiest way to understand Facebook’s innovation is to imagine the Mona Lisa rendered as only 16 colored squares, such as a 4×4 grid. A human looking at the grid would see an unforgivably jaggy, boxy image, perhaps recognizing the Mona Lisa’s famous outlines, but a trained computer could instantly identify the grid and replace it with the original piece of art. Employing three-layer convolutional neural networks, Facebook’s researchers have developed a technique that works not just for flat images but rather for 3D rendered scenes, transforming “highly aliased input” into “high fidelity and temporally stable results in real-time,” taking color, depth, and temporal motion vectors into account.
From a computational standpoint, the research suggests that a 3D environment rendered similarly to the original Doom game could be upscaled, with advance training, to a VR experience that looks like Quake. This doesn’t mean any developer could just convert a primitive 3D engine into a rich VR experience, but rather that the technique could help a power-constrained VR device — think Oculus Quest — internally render fewer pixels (see “Input” in the photo above) while displaying beautiful output (“Ours” in the above photo), using machine learning as the shortcut to achieve near-reference quality results.
While the specifics of the machine training are complicated, the upshot is that the network is trained using images grabbed from 100 videos of a given 3D scene, as real users would have experienced it from various head angles. These images enable a full-resolution reference scene that would take 140.6 milliseconds to render at 1,600 by 900 pixels to instead be rendered in 26.4 milliseconds at 400 by 225 pixels, then 4×4 upsampled in 17.68 milliseconds, for a total of 44.08 milliseconds — a nearly 3.2x savings in rendering time for a very close approximation of the original image. In this way, a Quest VR headset wearer would benefit from the scenario already having been thoroughly explored on much more powerful computers.
Three top investment pros open up about what it takes to get your video game funded.
The researchers say that their system dramatically outperforms the latest Unreal Engine’s temporal antialiasing upscaling technique, shown as Unreal TAAU above, by offering much greater accuracy of reconstructed details. They note that Nvidia’s deep-learning super sampling (DLSS) is closest to their solution, but DLSS relies on proprietary software and/or hardware that might not be available across all platforms. Facebook suggests that its solution won’t require special hardware or software and can be integrated easily into modern 3D engines, using their existing inputs to provide 4×4 supersampling at a time when common solutions use 2×2 upsampling at most.
As positive as the new system is, it’s unsurprisingly not perfect. Despite all the advance training and the temporally stable smoothness of the resulting imagery, it’s possible for some fine details to be lost in the reproduction process, such that text might not be readable on a sticky note (as shown above) if its presence wasn’t properly flagged within the last few frames of the low-resolution render. There are also still questions regarding the expense of implementation for “high-resolution display applications,” though more horsepower, better optimizations, and professional engineering are expected to improve the system’s performance.
The underlying research paper was published today as “Neural Supersampling for Real-Time Rendering,” attributed to Lei Xiao, Salah Nouri, Matt Chapman, Alexander Fix, Douglas Lanman, and Anton Kaplanyan of Facebook Reality Labs. It’s being presented at Siggraph 2020 in mid-July.
GamesBeatGamesBeat's creed when covering the game industry is "where passion meets business." What does this mean? We want to tell you how the news matters to you -- not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it. How will you do that? Membership includes access to:
- Newsletters, such as DeanBeat
- The wonderful, educational, and fun speakers at our events
- Networking opportunities
- Special members-only interviews, chats, and "open office" events with GamesBeat staff
- Chatting with community members, GamesBeat staff, and other guests in our Discord
- And maybe even a fun prize or two
- Introductions to like-minded parties