Presented by Qualcomm Technologies, Inc.

AI is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences, but fundamental research is required to advance AI further and speed up adoption. Join this VB Live event to learn about the advances that are quickly making AI truly ubiquitous.

Access this VB Live event on demand for free right here.

Artificial intelligence is revolutionizing not just industries, but the way we think about computing in general. It’s touching everything from devices to services to appliances, and it’ll affect us in ways that we aren’t even aware of yet. There are so many ways to approach AI, and ongoing research into its potential is already changing the world.

We’re in an exciting era where we have a massive amount of intelligent wireless edge devices, says Dr. Jilei Hou, senior director of engineering at Qualcomm Technologies. As more devices become intelligent and connected, they’ll also begin to reach their limits in terms of the volume of data sent to the cloud.

In order to scale, and also address the important issues of privacy and security, AI will need to be distributed from the cloud to the wireless edge, enabling on-device processing, sensing, security, and intelligence.

“AI in terms of training and inference is largely handled in the cloud — it’s a very cloud-centric view, Hou says. “As of today, we have seen a healthy wave of AI workloads, use cases, and applications that happen on the device as well, where they are powered by power-efficient on-device AI capabilities.”

In the future, with 5G connectivity, where high speed and ultra low latency become available, we’ll see the advent of a fully distributed system where the AI workload can be flexibly deployed either in the cloud or on the device. Lifelong on-device learning capability will become mature over time, making such a fully distributed system even more powerful.

The central mission of Qualcomm’s AI research team is to make sure all these core capabilities, including perception, reasoning, and action, can be truly ubiquitous across devices and industries, including mobile, auto, IoT, and cloud. They’re developing the kind of common platform that will be fundamental to scale AI across the industry and across companies, based on three pillars: power efficiency, personalization, and efficient learning.

Power efficiency is about developing deep learning techniques to allow efficient on-device compute and achieve power and error efficiency. Personalization is about developing machine learning technology to allow adaptation to user behavior and preference. Efficient learning is the ability to train with little or no data annotation, or to handle model robustness over virtual samples.

It’s applying its generalized CN, Bayesian deep learning and optimization, deep generating models, and reinforced learning research to use cases such as deep learning for graphics, which powers computer vision, video recognition, voice UI, and also even for fingerprints. To make sure AI capability can be scaled across industries, they’re developing platform techniques and tools such as neural network quantization, compression, kernel compilation, and computed memory.

Power efficiency innovations

Power efficiency is about addressing all the computation or power consumption bottlenecks, which usually occur on both ends. Running the data through the neural model where all the math computation happens obviously contributes to power consumption. But once you load all the weights or activations in and out of the memory, the data transfer between the memory, and a compute engine can end up dominating the power consumption many times, especially in LSTM or transformer models. They’ve found that they can compound compression, quantization, compilation, and also computing memory, to achieve efficiency improvements in a compounding effect.

“With compression, quantization, and compilation, if we can achieve power efficiency improvement respectively at 3X, 4X, and 4X each, then you can just multiply them together and imagine a power efficiency improvement on the order of 50X,” Hou explains.

Quantization achievements

Quantization refers to how we can automate reduction of the position of the weights and activation while still maintaining the model’s accuracy. In general, scientists who train the models are using floating point in a 32-bit representation. However, taking a mobile device for example, a very common architecture design, you can only use 8-bit as the inference engine, but not for long.

“We already have done a good amount of research with promising results that we can transform floating point 32 into a quantized NA result, where we can still essentially keep very comparable accuracy,” Hou says. “We can achieve more than a 4X increase in the performance per watt, for a savings in memory and compute.”

Paradigm shifts in hardware architecture design

Typically memory and compute are two separate engines, and the data transfer is one of the greater performance bottlenecks — but what you can combine these two building blocks together?

“Essentially, we can embed compute engines inside memory bit cells so we can enable compute all inside of memory together in an analog compute manner,” Hou says. “If we focus on one bit ops, we can achieve potentially up to 100X power efficiency improvement.”

Want to know more?

Qualcomm continues to advance AI research to make power-efficient AI ubiquitous from device to cloud, conduct leading research and development across the entire spectrum of AI, and develop a a powerful AI platform, which is fundamental to scale AI use cases across the industry and companies. And Hou covered far more in this recent webinar.

To dive more deeply into each of these topics, learn how developer, an OEM or ISV, take advantage of some of these AI model optimization technologies, and get the details on the exciting new ecosystem and architecture advances coming your way, catch up on this VB Live event!

Access for free, on demand right here.

In this webinar, we’ll discuss:

  • Several research topics across the entire spectrum of AI, such as generalized CNNs and deep generative models
  • AI model optimization research for power efficiency, including compression, quantization, and compilation
  • Advances in AI research to make AI ubiquitous


  • Jilei Hou, Sr. Director, Engineering, Qualcomm Technologies, Inc.
  • Jack Gold, Principal Analyst, J.Gold Associates, LLC.