XMOS unveils Xcore.ai, a powerful chip designed for AI processing at the edge

In 2005, Ali Dixon, then a student at the University of Bristol, founded fabless semiconductor company XMOS alongside former Oxford Semiconductor CEO James Foster, former Inmos chief architect David May, Hitesh Mehta of Acacia Capital Partners, and Noel Hurley. With seed funding from the University of Bristol enterprise fund and the Wyvern seed fund, as well as Amadeus Capital Partners, DJF Esprit, and Foundation Capital, the startup set about developing processor technology for voice, microphone arrays, audio, LED tiles, communications, and robotics products, with the goal of commercializing low-cost and efficient chipsets for internet of things devices.

It wasn't until 2017 that Bristol, England-based XMOS was in a place to make real progress, shortly after securing $15 million and spinning off its Graphcore division focused on server-side AI. The following three years culminated in the release of a low-cost, efficient AI chip dubbed xcore.ai, which the company officially unveiled this morning in a press release and accompanying white paper.

It's the third generation of the xcore.ai architecture, which was originally conceived to offer control processing that would enable silicon engineers to design differentiated products. The first-generation architecture made its way into "hundreds" of applications bridging different I/O protocols, while the second generation bolstered control and digital signal processing performance through the addition of a dual-issue pipeline.

The latest xcore.ai is a crossover chip designed to deliver high-performance AI, digital signal processing, control, and input/output in a single device with prices from $1. It's architected to provide real-time inferencing and decisioning at the edge, in addition to the communications, signal control, and processing traditionally handled by powerful microcontrollers.

Inside the chip are 1MB of RAM (with up to 400Gbit/s of bandwidth) and 16 logical cores with support for scalar, float, and vector instructions, plus up to 128 pins of software-programmable I/O with low-latency interconnects. There's a requisite integrated USB 2.0 PHY and MIPI interface for data collection and processing across a range of cameras, time-of-flight sensors, radar chips, and more, and a high-performance instruction set for cryptographic functions.

Xcore.ai consists of a tile sporting the aforementioned cores tightly coupled with the RAM, which is split into two 512KB modules. Each processor has a dual-issue execution unit capable of executing instructions at twice the clock frequency, split among eight concurrent threads that each run software tasks executing I/O, control, DSP, and AI processing. A 2Gbit/s interconnect -- xconnect -- optionally allows two xcore.ai chips to be integrated with a single system, roughly doubling performance in certain tasks.

XMOS CEO Mark Libbett says that xcore.ai enables data to be processed within nanoseconds, thanks to its sophisticated AI model acceleration capabilities. A single hardware thread is able to capture, preprocess, and store data in nearly real time, while one or more other hardware threads ingest a previous data frame or unpack data and perform basic operations such as debiasing.

"Xcore.ai delivers the world's highest processing power for a dollar," he said. "This, coupled with its flexibility, means electronics manufacturers (no matter their size) can embed multi-modal processing in smart devices to make life simpler, safer, and more satisfying for all."

The Xcore.ai processor is fully programmable in the C programming language, with features like a set of machine learning libraries and support for FreeRTOS, a real-time operating system for embedded devices that's been ported to 35 microcontroller platforms. A converter for TensorFlow, a lightweight version of Google's TensorFlow framework optimized for low-power devices, allows prototyping and deployment of AI models, and binary values for the activations (equations that determine the output of AI models) and weights (parameters that transform input data with the models) dramatically reduce execution time.

XMOS claims that, compared with semiconductor company Arm's 32-bit RISC processor cores for low-cost and energy-efficient microcontrollers, xcore.ai delivers a 32 times improvement in overall AI performance. That's not to mention 16 times and 15 times faster I/O processing and signal processing performance, respectively, and support for vector arithmetic up to 38.4 billion multiply accumulates (i.e., computes the product of two numbers) per second and up to 1 million 512-point fast Fourier transforms (FTTs) per second. (Fourier analysis converts a signal to a representation in the frequency domain and vice versa.)

"The utility of the first two generations of xcore, including their unique IO programmability and hard real-time performance is demonstrated by the diversity of applications for which it has been used -- from motor and motion control through preemptive maintenance systems to numerous audio products and children's toys," said XMOS in a statement. "We can only imagine what our developer community will do with all of the additional capabilities of xcore.ai."

The AI chip market is anticipated to be worth $91.18 billion by 2025, and dedicated AI chip startups -- among them Kneron, Blaize, AIStorm, Graphcore, Quadric, and Esperanto Technologies -- raised $1.5 billion in 2017 alone. But XMOS is well-funded, with over $94.8 million in total venture capital raised to date from previous and strategic investors Robert Bosch Venture Capital, Huawei, and Xilinx. And in anticipation of formidable new competition, it has made a number of acquisitions, like that of SETEM, a company specializing in audio algorithms for source separation.

More