Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More
The most advanced generative AI models in the world, like Stable Diffusion, generally run only in the cloud.
But what if that same model could run on a smartphone in your pocket? That’s the challenge that Qualcomm engineers have tackled. In research released today, Qualcomm has revealed that using a combination of software techniques and hardware optimization, it was able to shrink Stable Diffusion such that it could run inference models on common Android smartphone devices.
Stable Diffusion is developed by startup Stability AI and is one of the most popular generative AI models for image creation in use today, often competing against OpenAI’s DALL-E.
To be clear, the technology needed to train generative AI models is massive and is not going to run on a smartphone. Rather, what Qualcomm has worked on is the inference side, that is the “generative” piece, which enables a new image to be created from the pretrained model. To date, users have been able to generate Stable Diffusion–based images on their phones in an indirect approach, where a mobile app or browser accesses a cloud service that generates the image. What Qualcomm is now demonstrating is the ability to generate Stable Diffusion generative AI images directly on an Android smartphone, without the need to call out to the cloud to do the heavy lifting.
Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.
>>Follow VentureBeat’s ongoing generative AI coverage<<
“For privacy and security, when entering queries through a cloud API for Stable Diffusion, all your information or ideas are sent to the cloud server of some company,” Jilei Hou, VP, engineering at Qualcomm Technologies, told VentureBeat. “With on-device AI, that issue goes away since all your ideas stay solely on the device.”
Hou noted that for enterprise use of generative AI, this could be an even bigger issue where company confidential information needs to be protected.
Hardware alone isn’t enough to run generative AI
The demo that Qualcomm built to prove out its capabilities is running on a Qualcomm Reference Design device with the latest Snapdragon 8 Gen 2 Mobile Platform, which is in many commercial devices today.
Hou said the inferencing part is done on the Hexagon Processor, which is a complete custom design for AI acceleration by Qualcomm engineers and is part of the Snapdragon 8 Gen 2 silicon.
While Qualcomm’s silicon is powerful for a mobile device, Stable Diffusion presents a series of challenges to running directly on a smartphone. For one, Hou noted that the size of the model is over 1.1 billion parameters and the associated computing is more than 10 times the size of the typical workloads that are run on a smartphone.
“This is the biggest model that we have run on a smartphone,” Hou said. “All the full-stack optimizations that we made were very important to make the model fit and run efficiently.”
How Qualcomm shrank Stable Diffusion to run on Android
The optimizations that were required involved heavy use of the Qualcomm AI Stack, which is a portfolio of AI tools designed to help optimize models and workloads.
Hou explained that for Stable Diffusion, his team started with the FP32 version 1-5 open-source model from Hugging Face and made optimizations through quantization, compilation and hardware acceleration to run it on a phone powered by the Snapdragon 8 Gen 2 Mobile Platform.
To shrink the model, his team used the AI Model Efficiency Toolkit’s (AIMET) post-training quantization capabilities.
“Quantization not only increases performance, but also saves power by allowing the model to efficiently run on our dedicated AI hardware and to consume less memory bandwidth,” Hou said.
For compilation, the Qualcomm AI Engine direct framework was used to map the neural network into a program that runs efficiently on the smartphone hardware. Hou noted that the overall optimizations made in the Qualcomm AI Engine have significantly reduced runtime latency and power consumption. He added that all the work done to get Stable Diffusion running well on the smartphone will benefit future iterations and users of the Qualcomm AI Stack.
Looking forward, Hou said Qualcomm will build on lessons learned to bring other large generative AI models (for example, GPT-like models) from the cloud to the device. He adds that the optimizations for Stable Diffusion to run efficiently on phones can also be used for other platforms like laptops, XR headsets, and virtually any other device powered by Qualcomm Technologies.
“Running all the AI processing in the cloud will be too costly, which is why efficient edge AI processing is so important,” Hou said. “Edge AI processing ensures user privacy while running Stable Diffusion and other generative AI models since the input text and generated image never need to leave the device — this is a big deal for the adoption of both consumer and enterprise applications.”
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.