Join gaming leaders, alongside GamesBeat and Facebook Gaming, for their 2nd Annual GamesBeat & Facebook Gaming Summit | GamesBeat: Into the Metaverse 2 this upcoming January 25-27, 2022. Learn more about the event. 

In a technical paper quietly released earlier this year, IBM detailed what it calls the IBM Neural Computer, a reconfigurable parallel processing system designed to research and develop emerging AI algorithms and computational neuroscience. This week, the company published a preprint describing the first application demonstrated on the Neural Computer: a deep “neuroevolution” system that combines the hardware implementation of an Atari 2600, image preprocessing, and AI algorithms in an optimized pipeline. The coauthors report results competitive with state-of-the-art techniques, but perhaps more significantly, they claim that the system achieves a record training time of 1.2 million image frames per second.

The Neural Computer represents something of a shot across the bow in the AI computational arms race. According to an analysis recently released by OpenAI, from 2012 to 2018, the amount of compute used in the largest AI training runs grew more than 300,000 times with a 3.5-month doubling time, far exceeding the pace of Moore’s law. On pace with this, supercomputers like Intel’s forthcoming Aurora at the Department of Energy’s Argonne National Laboratory and AMD’s Frontier at Oak Ridge National Laboratory promise in excess of an exaflop (a quintillion floating-point computations per second) of computing performance.

Video games are a well-established platform for AI and machine learning research. They’ve gained currency not only because of their availability and the low cost of running them at scale, but because in certain domains like reinforcement learning, where AI learns optimal behaviors by interacting with the environment in pursuit of rewards, game scores serve as direct rewards. AI algorithms developed within games have shown to be adaptable to more practical uses, like protein folding prediction. And if the results from IBM’s Neural Computer prove to be repeatable, the system could be used to accelerate those AI algorithms’ development.

The Neural Computer

IBM’s Neural Computer consists of 432 nodes (27 nodes across 16 modular cards) based on field-programmable gate arrays (FPGAs) from Xilinx, a longtime strategic collaborator of IBM’s. (FPGAs are integrated circuits designed to be configured after manufacturing.) Each node comprises a Xilinx Zynq system-on-chip — a dual-core ARM A9 processor paired with an FPGA on the same die — along with 1GB of dedicated RAM. The nodes are arranged in a 3D mesh topology, interconnected vertically with electrical connections called through-silicon vias that pass completely through silicon wafers or dies.


The 2nd Annual GamesBeat and Facebook Gaming Summit and GamesBeat: Into the Metaverse 2

January 25 – 27, 2022

Learn More
IBM Neural Computer

Above: A single card from IBM’s Neural Computer.

Image Credit: IBM

On the networking side, the FPGAs provide access to the physical communication links among cards in order to establish multiple distinct channels of communication. A single card can theoretically support transfer speeds up to 432GB per second, but the Neural Computer’s network interfaces can be adjusted and progressively optimized to best suit a given application.

“The availability of FPGA resources on every node allows application-specific processor offload, a feature that is not available on any parallel machine of this scale that we are aware of,” wrote the coauthors of a paper detailing the Neural Computer’s architecture. “[M]ost of the performance-critical steps [are] offloaded and optimized on the FPGA, with the ARM [processor] … providing auxiliary support.”

Playing Atari games with AI

The researchers used 26 out of 27 nodes per card within the Neural Computer, carrying out experiments on a total of 416 nodes. Two instances of their Atari game-playing application ran on each of the 416 FPGAs, scaling up to 832 instances running in parallel. Each instance extracted frames from a given Atari 2600 game, performed image preprocessing, ran the images through machine learning models, and performed an action within the game.

To obtain the highest performance, the team shied away from emulating the Atari 2600, instead opting to use the FPGAs to implement the console’s functionality at higher frequencies. They tapped a framework from the open source MiSTer project, which aims to recreate consoles and arcade machines using modern hardware, and bumped the Atari 2600’s processor clock to 150 MHz up from 3.58 MHz. This produced roughly 2,514 frames per second compared with the original 60 frames per second.

In the image preprocessing step, IBM’s application converted the frames from color to grayscale, eliminated flickering, rescaled images to a smaller resolution, and stacked the frames into groups of four. It then passed these onto an AI model that reasoned about the game environment and a submodule that selected the action for the next frames by identifying the maximum reward as predicted by the AI model.

IBM Neural Computer

Above: Results from the experiments.

Image Credit: IBM

Yet another algorithm — a genetic algorithm — ran on an external computer connected to the Neural Computer via a PCIe connection. It evaluated the performance of each instance and identified the top-performing of the bunch, which it selected as “parents” of the next generation of instances.

Over the course of five experiments, IBM researchers ran 59 Atari 2600 games on the Neural Computer. The results imply that the approach wasn’t data-efficient compared with other reinforcement learning techniques — it required 6 billion game frames in total and failed at challenging exploration games like Montezuma’s Revenge and Pitfall. But it managed to outperform a popular baseline — a Deep Q-network, an architecture pioneered by DeepMind — in 30 out of 59 games after 6 minutes of training (200 million training frames) versus the Deep-Q network’s 10 days of training. With 6 billion training frames, it surpassed the Deep Q-network in 36 games while taking 2 orders of magnitude less training time (2 hours and 30 minutes).


GamesBeat's creed when covering the game industry is "where passion meets business." What does this mean? We want to tell you how the news matters to you -- not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it. How will you do that? Membership includes access to:
  • Newsletters, such as DeanBeat
  • The wonderful, educational, and fun speakers at our events
  • Networking opportunities
  • Special members-only interviews, chats, and "open office" events with GamesBeat staff
  • Chatting with community members, GamesBeat staff, and other guests in our Discord
  • And maybe even a fun prize or two
  • Introductions to like-minded parties
Become a member