LightOn researchers explain how they trained an AI model on an optical co-processor

In a technical paper published on the preprint server Arxiv.org, scientists at LightOn, a startup developing optical computing hardware for AI applications, detail what they claim is one of the first optical co-processors able to accelerate AI model training. In experiments on the popular MNIST data set of handwritten digits, their co-processor -- the Optical Processing Unit -- ostensibly helped train a model to recognize digits with 95.8% accuracy, a model that when trained on a graphics card achieved 97.6% accuracy.

Photonic integrated circuits, which are the foundations of LightOn's chip, promise a host of advantages over their electronic counterparts. They require only a limited amount of energy because light produces less heat than electricity does, and they're less susceptible to changes in ambient temperature, electromagnetic fields, and other noise. Latency in photonic designs is improved up to 10,000 times compared with silicon equivalents at power consumption levels "orders of magnitude" lower, and moreover, certain model workloads have been measured running 100 times faster compared with state-of-the-art electronic chips.

According to the paper, LightOn researchers used an in-house optical chip modified to include off-axis holography -- a small angle between the reference and the object beams that prevents overlapping -- in tandem with a technique known as direct feedback alignment (DFA). In machine learning, DFA employs random predictions of a model's error rates as training signals, which enables each layer that makes up the model to update independently of the others.

A typical AI model consists of "neurons" (mathematical functions) connected into one or more dense layers. Signals transmitted along the neurons adjust the synaptic strength (weights) of connections, and in this way, they extract features from data and learn to make predictions. Commonly, backpropagation -- backward propagation of errors -- is used to send the signals and make the various adjustments, but backpropagation prevents the asynchronous processing of layers; the adjustment of a layer depends on data elsewhere in the model, introducing inefficiency.

In pursuit of a faster, optically based DFA approach, the LightOn researchers had their chip encode a numerical representation called a vector onto a light beam with a component designed to spatially modulate the light. The beam propagated through a diffuser, resulting in an inference pattern (a speckle) that a camera detected, along with the beam's intensity. This allowed the chip to deliver random model error predictions at very large scales -- theoretically over a hundred billion parameters, which in this context refers to the configuration variables internal to models that define the skill of a model on a problem.

During experiments, the coauthors trained a model comprising 1,024 neurons for 10 epochs, which means each sample in MNIST had an opportunity to update the parameters 10 times. Running at 1.5 kHz, LightOn's co-processor performed 1,500 random projections per second, consuming about 30 watts of power -- an order of magnitude more power-efficient than the average graphics card.

The researchers postulate that switching to a different holography scheme will make it possible to perform calculations involving over a trillion parameters, but they leave this to future work. "As neural networks grow larger and more complex and data-hungry, training costs are skyrocketing," they wrote. "We expect performance to improve [on our chip] with the optimization of the currently available components, as well as with the development of future components. A better understanding of DFA will also help widen the scope of applications of this accelerator."

It's worth noting that LightOn's hardware, which is designed to be plugged into a standard server or workstation, isn't immune to the limitations of optical processing. Speedy photonic circuits require speedy memory, and then there's the matter of packaging every component -- including lasers, modulators, and optical combiners -- on a tiny chip wafer. Plus, questions remain about what kinds of nonlinear operations -- basic building blocks of models that enable them to make predictions -- can be executed in the optical domain.

That's perhaps why companies including Intel and LightOn itself are pursuing hybrid approaches that combine silicon and optical circuits on the same die, such that parts of the model run optically and parts of it run electronically. They're not alone -- startup Lightelligence has so far demonstrated the MNIST benchmark machine learning model, which uses computer vision to recognize handwritten digits, on its accelerator. And Lightmatter, Optalysis, and Fathom Computing, three other startups vying for a slice of the budding optical chip market, have raised tens of millions in venture capital for their own chips.

More