The power of AI models has long been correlated with their size, with models growing to hundreds of billions or trillions of parameters. But very large models come with obvious trade-offs for enterprises, including lack of control over the underlying system, reliance on third-party clouds, and unpredictable costs. 

Now, a counter-trend is accelerating with small language models (SLMs) designed to run locally on PCs and phones. The latest and most potent example is Meta’s MobileLLM-R1, a family of sub-billion parameter models that deliver specialized reasoning. Its release is part of a wider industry push for developing compact, powerful models that challenge the “bigger is better” narrative.

Meta’s MobileLLM-R1: an on-device reasoning specialist

Meta’s MobileLLM-R1 is a family of reasoning models that come in 140M, 360M, and 950M parameter sizes and are purpose-built for math, coding, and scientific reasoning  (they’re not suitable for general chat applications). 

The models are made more efficient based on some design choices that Meta laid out in the original MobileLLM models, optimized specifically for sub-one-billion parameter architectures. For example, they use a "deep-and-thin" architecture (preferring larger number of layers over larger embedding dimensions) and established techniques like grouped-query attention (a technique that shares attention weights to reduce the number of parameters in the model). This design choice, combined with a remarkably efficient training process, allows the models to perform complex tasks on resource-constrained devices.

The models were trained on a total of around 5 trillion tokens (in comparison to tens of trillions of tokens in other similarly sized models), including distilled data from Llama-3.1-8B-Instruct, which efficiently transferred its advanced reasoning capabilities without the massive training cost.

MobileLLM-R1 performance

MobileLLM-R1 matches or outperforms other models of similar size on key reasoning benchmarks (source: Hugging Face)

The 950M model slightly outperforms Alibaba's Qwen3-0.6B on the MATH benchmark (74.0 vs 73.0) and establishes a clear lead on the LiveCodeBench coding test (19.9 vs 14.9). This makes it ideal for applications requiring reliable, offline logic, such as on-device code assistance in developer tools. 

However, there is a major catch: the model is released under Meta’s FAIR Non-Commercial license, which strictly prohibits any commercial use of the model or its outputs. For businesses, this positions MobileLLM-R1 as a powerful research blueprint or an internal tool rather than a production-ready asset that can be monetized. (Meta’s Hugging Face page says “MobileLLM-R1 is FAIR NC licensed as of now,” which means it might change in the future.)

The competitive landscape of small language models

While MobileLLM-R1 pushes the performance boundary, the broader SLM landscape offers commercially viable alternatives tailored to different enterprise needs. Google’s Gemma 3 270M, for instance, is an ultra-efficient workhorse. At just 270 million parameters, it is designed for extreme power savings. Internal tests showed 25 conversations consumed less than 1% of a phone’s battery. Its permissive license makes it a strong choice for companies looking to fine-tune a fleet of tiny, specialized models for tasks like content moderation or compliance checks.

For businesses needing strong, out-of-the-box reasoning without licensing restrictions, Alibaba's Qwen3-0.6B is a leading contender. With its Apache-2.0 license and performance that rivals MobileLLM-R1, it serves as a practical, commercially-ready alternative. Meanwhile, other players are focusing on enterprise control and new capabilities. Nvidia's Nemotron-Nano offers unique "control knobs," allowing developers to toggle reasoning on or off with simple commands and set a "thinking budget" to balance speed and accuracy. Liquid AI is pushing into on-device multimodality with models that handle both text and vision, paired with a developer kit designed for rapid deployment. Liquid AI is experimenting with “liquid neural networks,” a novel architecture that promises to cut down the compute and memory costs of running highly capable AI systems. 

From a ‘god model’ to a ‘fleet of specialists’

This industry-wide shift to SLMs is a direct response to enterprise pain points. The reliance on large, third-party cloud models creates unpredictable costs and gives companies little control over model updates or deprecations. As Liquid AI CEO Ramin Hasani told VentureBeat in a recent interview, "Enterprises are realizing that small models offer predictability. Instead of paying per API call, you can license a model once and use it infinitely on-device.” This move also solves for privacy and reliability, as processing sensitive data locally enhances compliance and ensures applications work without a constant internet connection. The potential impact is significant, with Hasani seeing a "trillion-dollar opportunity in the small model regime by 2035."

The availability of capable SLMs enables a new architectural playbook. Instead of relying on one massive, general-purpose model, organizations can deploy a fleet of specialist models. This strategy, similar to the software industry’s shift from monolithic applications to microservices, involves breaking down complex problems into a series of smaller, repetitive subtasks, each handled by a fine-tuned SLM. 

This approach is better aligned with the gradual shift toward agentic applications, where tasks are distributed across specialized agents. According to Nvidia researchers, the current dominance of large language models (LLMs) is often "excessive and misaligned with the functional demands of most agentic use cases." On the other hand, the fleet approach is better aligned with how AI agents work and results in lower costs, increased speed, and clearer visibility when failures occur.

This doesn't make large models obsolete. On the contrary, they will take on a new role. As AI researcher Andrej Karpathy explains, today's largest models are powerful enough to “refactor and mold the training data into ideal, synthetic formats” that can be used to distill pure reasoning skills into smaller, more efficient successors. This creates a symbiotic relationship where each generation of massive models helps create the perfect training sets for the next generation of agile SLMs, making AI development more sustainable.

The simultaneous moves by Meta, Google, Nvidia, and others confirm the industry is embracing a more pragmatic AI future. The "bigger is better" era isn't over, but it is no longer the only game in town. The value proposition of small models is now undeniable. Not every problem needs a 175-billion-parameter hammer. For a growing number of enterprise use cases, the sharper tool is the smaller one.