Deci’s NLP model clocks 100,000 queries per second in latest MLPerf results

Deci, a deep-learning software maker that uses AI-powered tools to help teams create and deploy AI models at scale, today announced that a natural language processing (NLP) model generated by its in-house technology has clocked over 100,000 queries per second in MLPerf Inference v3.0 benchmark results.

The performance, Deci said, is the highest inference speed ever to be published at MLPerf for NLP. For reference, other submitters’ throughput (queries per second) was about seven times slower in the same category.

The results from the Israeli company come as it tries to position itself as a facilitator of AI applications for enterprises, competing against the likes of Matlab, Dataloop and Deepcube.

What is MLPerf?

Launched by leaders from academia, research labs, and leading tech giants, MLPerf is a benchmark suite aimed at providing evaluations of training and inference performance for hardware, software and services. For the latest inference test, Deci generated a model with its automated neural architecture construction (AutoNAC) technology and submitted it under the offline scenario in MLPerf’s open division in the BERT 99.9 category.

The AutoNAC engine enables teams to develop hardware-aware model architectures tailored for reaching specific performance targets on their inference hardware. In this case, the company used it to generate architectures tailored for various NVIDIA accelerators. The goal was to maximize throughput while keeping the accuracy within a 0.1% margin of error from the baseline of 90.874 F1 (SQUAD).

How did Deci's NLP model do in tests?

When using Nvidia A30 GPU for the benchmark, Deci’s model delivered a throughput performance of 5885 QPS per TeraFLOPs while other submissions clocked just 866 QPS. Similarly, when using Nvidia A100 80GB GPU and Nvidia H100 PCIe GPU, the throughput stood at 13,377 QPS and 17,584 QPS, respectively — again significantly higher than that delivered by other submitters (1756 QPS and 7921 QPS). In all three cases, the accuracy was higher than the targeted baseline.

_{The orange line represents the highest throughput results achieved by other MLPerf submitters on the same hardware and within the same BERT 99.9 category}

Notably, the benchmark got even more interesting when the models were put to test on eight Nvidia A100 GPUs. In this case, Deci’s NLP model handled 103,053 queries per second per TeraFLOPs, delivering 7 times faster performance than other submissions (13,967 QPS) and higher accuracy.

“With Deci’s platform, teams no longer need to compromise either accuracy or inference speed and achieve the optimal balance between these conflicting factors by easily applying Deci’s advanced optimization techniques,” said Ran El-Yaniv, Deci’s chief scientist and cofounder.

The company also added that these results show that teams using its technology can achieve higher throughput while scaling back to lower-priced hardware, like going from A100 to A30.

The benchmark results come just a month after Deci debuted a new version of its AutoNac-powered deep learning development platform with support for generative AI model optimization. Currently, the company works with enterprises like Ibex, Intel, Sight and RingCentral and claims to cut down AI development process by up to 80% while ensuring 30% lower development costs per model on average.

What is MLPerf?

How did Deci's NLP model do in tests?

More